# üìä First Look at Data: London Temperature Records

**DS105W W01 NB02 ‚Äì Data for Data Science (Winter Term 2025/2026)**

<div style="font-family: system-ui; padding: 20px 30px 20px 20px; background-color: #FFFFFF; border-left: 8px solid #ED9255; border-radius: 8px; box-shadow: 0 4px 12px rgba(0, 0, 0, 0.1);max-width:600px;color:#212121;">

**Student Notebook**
- üìÖ Date: [Add today's date here]
- üë§ Name: [Add your name here]
- üéØ Purpose: Explore raw daily temperature data and understand data granularity

<span style="display:block;line-height:1.15em;color:#666666;font-size:0.9em;">

ü•Ö **Learning Goals**
 i) See what raw daily temperature data looks like,
 ii) Understand how this connects to the yearly data from W01 Practice,
 iii) Experience what pandas can do with tabular data,
 iv) Think about how to transform daily data into aggregated insights.

</span>

</div>

‚öôÔ∏è **Importing additional libraries**

(other than those that already come with Python)

In [2]:
import pandas as pd

## Section 1: From Daily Data to Yearly Insights

In üìù [**W01 Practice**](https://lse-dsi.github.io/DS105/2025-2026/winter-term/practice/week01.html), you worked with yearly heatwave counts. But where does that aggregated data come from? It starts with daily temperature measurements like the ones we'll explore today.

**Today's data:** Daily maximum temperatures for London (1990-2025)

**W01 Practice data:** Yearly counts of heatwaves

By the end of today, you'll understand how we could transform one into the other.

In [3]:
# Load daily temperature data for London
df = pd.read_csv('./data/london_max_temp_1990_mid_2025.csv')

print("Data loaded successfully!")
print(f"Dataset shape: {df.shape}")

Data loaded successfully!
Dataset shape: (13042, 2)


### 1.1 What Does This Data Look Like?

In [4]:
# First few rows
df.head()

Unnamed: 0,date,max_temp_c
0,1990-01-01,6.0
1,1990-01-02,7.0
2,1990-01-03,6.0
3,1990-01-04,6.6
4,1990-01-05,9.7


In [5]:
# Last few rows
df.tail()

Unnamed: 0,date,max_temp_c
13037,2025-09-11,18.1
13038,2025-09-12,17.7
13039,2025-09-13,17.3
13040,2025-09-14,17.9
13041,2025-09-15,17.9


In [None]:
print(f"We have {len(df)} days of temperature data")
print(f"That's roughly {len(df)/365:.1f} years of daily measurements")

**Q1:** How is this different from the yearly heatwave data you saw in üìù [**W01 Practice**](https://lse-dsi.github.io/DS105/2025-2026/winter-term/practice/week01.html)?

> _[Your answer here]_

## Section 2: Finding Hot Days (28¬∞C or Above)

Remember from the üìù [**W01 Practice**](https://lse-dsi.github.io/DS105/2025-2026/winter-term/practice/week01.html): [by definition](https://weather.metoffice.gov.uk/learn-about/weather/types-of-weather/temperature/heatwave), a heatwave in London is when you see a streak of at least three consecutive days where the maximum temperature is 28¬∞C or higher. 


Let's find these hot days in our daily data.

In [6]:
# Basic info about our temperature data
print(f"Highest temperature: {df['max_temp_c'].max()}¬∞C")
print(f"Lowest temperature: {df['max_temp_c'].min()}¬∞C")
print(f"Average temperature: {df['max_temp_c'].mean():.1f}¬∞C")

Highest temperature: 38.0¬∞C
Lowest temperature: -4.1¬∞C
Average temperature: 14.5¬∞C


In [7]:
# Find all days with temperature 28¬∞C or above
hot_days = df[df['max_temp_c'] >= 28]

print(f"Total days in dataset: {len(df)}")
print(f"Hot days (‚â•28¬∞C): {len(hot_days)}")
print(f"That's {len(hot_days)/len(df)*100:.1f}% of all days in the dataset")

Total days in dataset: 13042
Hot days (‚â•28¬∞C): 160
That's 1.2% of all days in the dataset


In [8]:
# Look at some hot days
print("First 10 hot days in our data:")
hot_days[['date', 'max_temp_c']].head(10)

First 10 hot days in our data:


Unnamed: 0,date,max_temp_c
200,1990-07-20,28.5
201,1990-07-21,28.6
212,1990-08-01,29.6
213,1990-08-02,31.0
214,1990-08-03,33.8
215,1990-08-04,32.7
1653,1994-07-12,30.4
1665,1994-07-24,28.9
1671,1994-07-30,28.1
2006,1995-06-30,29.3


In [9]:
# The hottest days on record
hottest_days = hot_days.sort_values('max_temp_c', ascending=False)
print("Top 20 hottest days on record:")
hottest_days[['date', 'max_temp_c']].head(20)

Top 20 hottest days on record:


Unnamed: 0,date,max_temp_c
11887,2022-07-19,38.0
10797,2019-07-25,35.3
11886,2022-07-18,35.1
4969,2003-08-10,34.8
11169,2020-07-31,34.0
11176,2020-08-07,33.9
214,1990-08-03,33.8
12965,2025-07-01,33.7
11912,2022-08-13,33.0
11180,2020-08-11,32.9


**Q2:** What fraction of all days are ‚Äúhot days‚Äù (‚â• 28¬∞C)?


> _[Your answer here]_

**Q3:** Do you notice any obvious patterns in the dates of the hot days?

> _[Your answer here]_

## Section 3: Work on the Data Science Workflow 'by hand'

**GOAL:** How would you transform today's daily temperature data into the yearly heatwave counts you saw in W01 Practice? Try to relate your answer to the [Data Science Workflow](https://lse-dsi.github.io/DS105/2025-2026/winter-term/guides/data-science-workflow.html) we discussed in the first lecture.

**HOW:** Describe _precisely_ what you would do using numbered bullet points. Step-by-step.

<details><summary>Click to see an example of how you could get started.</summary>

<div style="font-size:0.65em;width:50%">

For example:

```markdown
1. Read the daily temperature data in full

2. Identify hot days (‚â• 28¬∞C) by adding a `*` to each line that meets the criteria

```

</div>

</details>

> [_Write your answers here_]

---

# (Optional) Section 4: Exploring Different Temperature Ranges

*Complete this section after class if you want additional hands-on experience with the data. Reading this code, even if you don't understand it, will help you internalise the concepts that will come in the next week.*

Let's explore different temperature categories to get a fuller picture of London's climate.

In [10]:
# Count days by temperature ranges
freezing_days = df[df['max_temp_c'] <= 0]
cold_days = df[(df['max_temp_c'] > 0) & (df['max_temp_c'] <= 10)]
mild_days = df[(df['max_temp_c'] > 10) & (df['max_temp_c'] <= 20)]
warm_days = df[(df['max_temp_c'] > 20) & (df['max_temp_c'] < 28)]
hot_days = df[df['max_temp_c'] >= 28]

print("Days by temperature category:")
print(f"Freezing (‚â§0¬∞C): {len(freezing_days)} days")
print(f"Cold (1-10¬∞C): {len(cold_days)} days")
print(f"Mild (11-20¬∞C): {len(mild_days)} days") 
print(f"Warm (21-27¬∞C): {len(warm_days)} days")
print(f"Hot (‚â•28¬∞C): {len(hot_days)} days")

Days by temperature category:
Freezing (‚â§0¬∞C): 30 days
Cold (1-10¬∞C): 3311 days
Mild (11-20¬∞C): 7088 days
Warm (21-27¬∞C): 2453 days
Hot (‚â•28¬∞C): 160 days


In [11]:
# Look at some examples of cold days
print("Some of the coldest days:")
coldest_examples = df.sort_values('max_temp_c')
coldest_examples[['date', 'max_temp_c']].head(10)

Some of the coldest days:


Unnamed: 0,date,max_temp_c
7641,2010-12-03,-4.1
404,1991-02-09,-3.5
402,1991-02-07,-3.5
403,1991-02-08,-2.5
2216,1996-01-26,-2.2
2559,1997-01-03,-1.4
2557,1997-01-01,-1.3
2556,1996-12-31,-1.1
7640,2010-12-02,-1.0
8418,2013-01-18,-0.8


In [12]:
# Look at some mild days
print("Some mild days (around 15¬∞C):")
mild_examples = df[(df['max_temp_c'] >= 14) & (df['max_temp_c'] <= 16)]
mild_examples[['date', 'max_temp_c']].head(10)

Some mild days (around 15¬∞C):


Unnamed: 0,date,max_temp_c
50,1990-02-20,14.5
52,1990-02-22,14.0
54,1990-02-24,14.1
68,1990-03-10,16.0
69,1990-03-11,14.9
74,1990-03-16,15.3
77,1990-03-19,14.0
78,1990-03-20,14.7
79,1990-03-21,14.7
88,1990-03-30,15.5


**Q4:** Which temperature category has the most days? Does this surprise you for London's climate?

> _[Your answer here]_

---

# (Optional) Section 5: Seasonal Patterns

*This section gives you a preview of working with dates - you'll learn more about this in future weeks.*

In [13]:
# Let's look at temperature by different months
# We'll just look at a few examples rather than doing complex analysis

print("Some July temperatures (summer):")
july_temps = df[df['date'].str.contains('-07-')]  # Simple way to find July dates
july_temps[['date', 'max_temp_c']].head(10)

Some July temperatures (summer):


Unnamed: 0,date,max_temp_c
181,1990-07-01,18.3
182,1990-07-02,17.0
183,1990-07-03,18.0
184,1990-07-04,17.3
185,1990-07-05,17.8
186,1990-07-06,17.9
187,1990-07-07,21.3
188,1990-07-08,23.2
189,1990-07-09,17.8
190,1990-07-10,17.9


In [14]:
print("Some January temperatures (winter):")
january_temps = df[df['date'].str.contains('-01-')]  # Simple way to find January dates  
january_temps[['date', 'max_temp_c']].head(10)

Some January temperatures (winter):


Unnamed: 0,date,max_temp_c
0,1990-01-01,6.0
1,1990-01-02,7.0
2,1990-01-03,6.0
3,1990-01-04,6.6
4,1990-01-05,9.7
5,1990-01-06,10.7
6,1990-01-07,9.1
7,1990-01-08,8.9
8,1990-01-09,9.9
9,1990-01-10,10.5


In [15]:
# Compare summer vs winter averages (basic approach)
july_avg = july_temps['max_temp_c'].mean()
january_avg = january_temps['max_temp_c'].mean()

print(f"Average July temperature: {july_avg:.1f}¬∞C")
print(f"Average January temperature: {january_avg:.1f}¬∞C")
print(f"Difference: {july_avg - january_avg:.1f}¬∞C")

Average July temperature: 22.0¬∞C
Average January temperature: 7.5¬∞C
Difference: 14.5¬∞C


**Q5:** Based on these examples, what can you tell about London's seasonal temperature patterns?

> _[Your answer here]_

**Q6:** If you wanted to identify which months have the most hot days (‚â•28¬∞C), how would you approach this? Think about the steps.

> _[Your answer here]_

## Final Reflection

**Q7:** What was the most interesting thing you discovered about London's temperature data today?

> _[Your answer here]_