We've included 2 solutions here. There are also many other ways of formatting this data. Don't wory if you haven't done it exactly like we have.... this is just guide and not a bible.

In [2]:
import pandas as pd
import requests
from datetime import datetime
import pytz

# Solution using a for loop

First we need to take a look at the JSON so we know what we're dealing with.
- We can view what the JSON contains by checking out the [documentation on the OWM website](https://openweathermap.org/forecast5)
- Or this can be done manually, like below:

In [2]:
city = 'Berlin'
API_key = '4fe53ee5e34a7d900ed58bd74bbbb0b7'

# check out the docs for more info on making an api call https://openweathermap.org/forecast5
url = (f"http://api.openweathermap.org/data/2.5/forecast?q={city}&appid={API_key}&units=metric")

response = requests.get(url)
json = response.json()

json

{'cod': '200',
 'message': 0,
 'cnt': 40,
 'list': [{'dt': 1660986000,
   'main': {'temp': 19.97,
    'feels_like': 20.37,
    'temp_min': 18.71,
    'temp_max': 19.97,
    'pressure': 1010,
    'sea_level': 1010,
    'grnd_level': 1009,
    'humidity': 90,
    'temp_kf': 1.26},
   'weather': [{'id': 801,
     'main': 'Clouds',
     'description': 'few clouds',
     'icon': '02d'}],
   'clouds': {'all': 20},
   'wind': {'speed': 3.07, 'deg': 309, 'gust': 7.01},
   'visibility': 10000,
   'pop': 0.36,
   'sys': {'pod': 'd'},
   'dt_txt': '2022-08-20 09:00:00'},
  {'dt': 1660996800,
   'main': {'temp': 19.78,
    'feels_like': 20.09,
    'temp_min': 19.39,
    'temp_max': 19.78,
    'pressure': 1012,
    'sea_level': 1012,
    'grnd_level': 1010,
    'humidity': 87,
    'temp_kf': 0.39},
   'weather': [{'id': 500,
     'main': 'Rain',
     'description': 'light rain',
     'icon': '10d'}],
   'clouds': {'all': 47},
   'wind': {'speed': 1.49, 'deg': 282, 'gust': 3.15},
   'visibility': 10

Now we've discovered what information we have to work with. Let's decide what we want to keep and what we wish to lose.

I feel that from json['list'] it would be good to keep
- 'weather.main', 'weather.description', 'dt_txt', 'main.temp', 'main.feels_like' 'clouds.all', 'rain.3h', 'snow.3h' 'wind.speed', 'wind.deg', 'main.humidity', 'main.pressure'. 

And from json['city'] it would be good to keep 
- 'name' and 'country. 

Just to make sure that we got the right place. And as an added extra we'll also include the time that API call was made, so we know how up to date our forecast is.

**Optional:** let's get a timestamp of when we get the data. Datetime uses the uses the current time of the system, which on local computers is normally correct. But as we're in the cloud, computers are not always in our country, and we therefore add on the timezone module to ensure that our timestamp is local to us and not the computer.

In [25]:
tz = pytz.timezone('Europe/Berlin')
now = datetime.now().astimezone(tz)

now

datetime.datetime(2022, 8, 27, 13, 43, 55, 5676, tzinfo=<DstTzInfo 'Europe/Berlin' CEST+2:00:00 DST>)

**Next** let's loop through the json['list'] information get the weather information

In [26]:
# we'll store the information in this dicitonary:
weather_dict = {'city': [],
                'country': [],
                'forecast_time': [],
                'outlook': [],
                'detailed_outlook': [],
                'temperature': [],
                'temperature_feels_like': [],
                'clouds': [],
                'rain': [],
                'snow': [],
                'wind_speed': [],
                'wind_deg': [],
                'humidity': [],
                'pressure': []}
                #'information_retrieved_at': []}

# let's begin the loop
for i in json['list']:
  weather_dict['city'].append(json['city']['name'])
  weather_dict['country'].append(json['city']['country'])
  weather_dict['forecast_time'].append(i['dt_txt'])
  weather_dict['outlook'].append(i['weather'][0]['main'])
  weather_dict['detailed_outlook'].append(i['weather'][0]['description'])
  weather_dict['temperature'].append(i['main']['temp'])
  weather_dict['temperature_feels_like'].append(i['main']['feels_like'])
  weather_dict['clouds'].append(i['clouds']['all'])
  # sometimes the data is missing for rain and snow. As it is not always raining or snowing
  # we cannot make a DataFrame unless the lists are all the same length, therefore missing values are bad
  # here we say try to append a value if there is one. If not, append a 0
  try:
      weather_dict['rain'].append(i['rain']['3h'])
  except:
      weather_dict['rain'].append('0')
  try:
      weather_dict['snow'].append(i['snow']['3h'])
  except:
      weather_dict['snow'].append('0')
  weather_dict['wind_speed'].append(i['wind']['speed'])
  weather_dict['wind_deg'].append(i['wind']['deg'])
  weather_dict['humidity'].append(i['main']['humidity'])
  weather_dict['pressure'].append(i['main']['pressure'])
  #weather_dict['information_retrieved_at'].append(now.strftime("%d/%m/%Y %H:%M:%S"))
  

**Now** we convert our dictionary to a DataFrame

In [27]:
weather_from_dict_df = pd.DataFrame(weather_dict)

weather_from_dict_df.head()

Unnamed: 0,city,country,forecast_time,outlook,detailed_outlook,temperature,temperature_feels_like,clouds,rain,snow,wind_speed,wind_deg,humidity,pressure
0,Berlin,DE,2022-08-27 09:00:00,Rain,light rain,20.06,20.58,100,0.37,0,3.74,318,94,1009
1,Berlin,DE,2022-08-27 12:00:00,Rain,moderate rain,21.12,21.56,96,4.61,0,4.38,295,87,1010
2,Berlin,DE,2022-08-27 15:00:00,Rain,light rain,21.39,21.73,97,2.37,0,4.85,297,82,1011
3,Berlin,DE,2022-08-27 18:00:00,Rain,light rain,19.55,19.96,97,1.62,0,3.9,320,92,1012
4,Berlin,DE,2022-08-27 21:00:00,Rain,light rain,18.87,19.24,100,1.02,0,3.4,316,93,1013


**As a final step**, to keep everything tidy: let's bring everything we did together in a function. And allow the function to take a list of cities as an input

In [6]:
def get_weather_loop(cities):

  API_key = '4fe53ee5e34a7d900ed58bd74bbbb0b7'

  tz = pytz.timezone('Europe/Berlin')
  now = datetime.now().astimezone(tz)

  weather_dict = {'city': [], 
                'forecast_time': [],
                'outlook': [],
                'detailed_outlook': [],
                'temperature': [],
                'temperature_feels_like': [],
                'clouds': [],
                'rain': [],
                'snow': [],
                'wind_speed': [],
                'wind_deg': [],
                'humidity': [],
                'pressure': [],
                'information_retrieved_at': []}

  for city in cities:
    url = (f"http://api.openweathermap.org/data/2.5/forecast?q={city}&appid={API_key}&units=metric")
    response = requests.get(url)
    json = response.json()

    for i in json['list']:
      weather_dict['city'].append(json['city']['name'])
      weather_dict['country'].append(json['city']['country'])
      weather_dict['forecast_time'].append(i['dt_txt'])
      weather_dict['outlook'].append(i['weather'][0]['main'])
      weather_dict['detailed_outlook'].append(i['weather'][0]['description'])
      weather_dict['temperature'].append(i['main']['temp'])
      weather_dict['temperature_feels_like'].append(i['main']['feels_like'])
      weather_dict['clouds'].append(i['clouds']['all'])
      try:
          weather_dict['rain'].append(i['rain']['3h'])
      except:
          weather_dict['rain'].append('0')
      try:
          weather_dict['snow'].append(i['snow']['3h'])
      except:
          weather_dict['snow'].append('0')
      weather_dict['wind_speed'].append(i['wind']['speed'])
      weather_dict['wind_deg'].append(i['wind']['deg'])
      weather_dict['humidity'].append(i['main']['humidity'])
      weather_dict['pressure'].append(i['main']['pressure'])
      weather_dict['information_retrieved_at'].append(now.strftime("%d/%m/%Y %H:%M:%S"))

  return pd.DataFrame(weather_dict)

In [18]:
json['list']['wind']['speed']

TypeError: list indices must be integers or slices, not str

In [7]:
get_weather_loop(['Berlin', 'London'])

Unnamed: 0,city,country,forecast_time,outlook,detailed_outlook,temperature,temperature_feels_like,clouds,rain,snow,wind_speed,wind_deg,humidity,pressure,information_retrieved_at
0,Berlin,DE,2022-08-20 09:00:00,Clouds,few clouds,19.97,20.37,20,0,0,3.07,309,90,1010,20/08/2022 09:28:10
1,Berlin,DE,2022-08-20 12:00:00,Rain,light rain,19.78,20.09,47,0.17,0,1.49,282,87,1012,20/08/2022 09:28:10
2,Berlin,DE,2022-08-20 15:00:00,Rain,moderate rain,19.24,19.52,73,5.54,0,0.78,268,88,1013,20/08/2022 09:28:10
3,Berlin,DE,2022-08-20 18:00:00,Rain,light rain,19.11,19.37,100,0.11,0,1.43,191,88,1015,20/08/2022 09:28:10
4,Berlin,DE,2022-08-20 21:00:00,Clouds,overcast clouds,18.66,18.93,100,0,0,1.37,235,90,1015,20/08/2022 09:28:10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75,London,GB,2022-08-24 18:00:00,Clouds,overcast clouds,27.17,27.52,95,0,0,2.23,213,49,1013,20/08/2022 09:28:10
76,London,GB,2022-08-24 21:00:00,Rain,light rain,23.53,23.64,77,0.42,0,1.68,213,65,1014,20/08/2022 09:28:10
77,London,GB,2022-08-25 00:00:00,Rain,moderate rain,19.68,20.16,89,8.32,0,1.47,14,94,1013,20/08/2022 09:28:10
78,London,GB,2022-08-25 03:00:00,Rain,moderate rain,18.53,18.89,97,3.84,0,3.64,317,94,1013,20/08/2022 09:28:10


In [8]:
weather_data = get_weather_loop (['Berlin', 'Frankfurt', 'Hamburg', 'Paris', 'Brussels', 'Moscow', 'Stockholm', 'Madrid', 'Barcelona'])

In [15]:
weather_data.head()

Unnamed: 0,city,country,forecast_time,outlook,detailed_outlook,temperature,temperature_feels_like,clouds,rain,snow,wind_speed,wind_deg,humidity,pressure,information_retrieved_at
0,Berlin,DE,2022-08-20 09:00:00,Clouds,few clouds,19.97,20.37,20,0.0,0,3.07,309,90,1010,20/08/2022 09:28:10
1,Berlin,DE,2022-08-20 12:00:00,Rain,light rain,19.78,20.09,47,0.17,0,1.49,282,87,1012,20/08/2022 09:28:10
2,Berlin,DE,2022-08-20 15:00:00,Rain,moderate rain,19.24,19.52,73,5.54,0,0.78,268,88,1013,20/08/2022 09:28:10
3,Berlin,DE,2022-08-20 18:00:00,Rain,light rain,19.11,19.37,100,0.11,0,1.43,191,88,1015,20/08/2022 09:28:10
4,Berlin,DE,2022-08-20 21:00:00,Clouds,overcast clouds,18.66,18.93,100,0.0,0,1.37,235,90,1015,20/08/2022 09:28:10


In [16]:
weather_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 360 entries, 0 to 359
Data columns (total 15 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   city                      360 non-null    object 
 1   country                   360 non-null    object 
 2   forecast_time             360 non-null    object 
 3   outlook                   360 non-null    object 
 4   detailed_outlook          360 non-null    object 
 5   temperature               360 non-null    float64
 6   temperature_feels_like    360 non-null    float64
 7   clouds                    360 non-null    int64  
 8   rain                      360 non-null    object 
 9   snow                      360 non-null    object 
 10  wind_speed                360 non-null    float64
 11  wind_deg                  360 non-null    int64  
 12  humidity                  360 non-null    int64  
 13  pressure                  360 non-null    int64  
 14  informatio

# Solution using `.json_normalize()`

**First** we need to make an API call to get our information

In [3]:
city = 'Berlin'
API_key = '4fe53ee5e34a7d900ed58bd74bbbb0b7'
url = (f"http://api.openweathermap.org/data/2.5/forecast?q={city}&appid={API_key}&units=metric") 
response = requests.get(url)
json = response.json()

If we now try to use `.json_normalize()` on the `list` part of the json, you'll see that json_normalize can get confused when it hits a list instead of a subdictionary. Take a look at the weather column below

In [4]:
pd.DataFrame((json['list'])).head()

Unnamed: 0,dt,main,weather,clouds,wind,visibility,pop,rain,sys,dt_txt
0,1661590800,"{'temp': 20.06, 'feels_like': 20.58, 'temp_min...","[{'id': 500, 'main': 'Rain', 'description': 'l...",{'all': 100},"{'speed': 3.74, 'deg': 318, 'gust': 4.38}",10000,0.38,{'3h': 0.37},{'pod': 'd'},2022-08-27 09:00:00
1,1661601600,"{'temp': 21.12, 'feels_like': 21.56, 'temp_min...","[{'id': 501, 'main': 'Rain', 'description': 'm...",{'all': 96},"{'speed': 4.38, 'deg': 295, 'gust': 4.72}",10000,0.87,{'3h': 4.61},{'pod': 'd'},2022-08-27 12:00:00
2,1661612400,"{'temp': 21.39, 'feels_like': 21.73, 'temp_min...","[{'id': 500, 'main': 'Rain', 'description': 'l...",{'all': 97},"{'speed': 4.85, 'deg': 297, 'gust': 6.72}",10000,0.83,{'3h': 2.37},{'pod': 'd'},2022-08-27 15:00:00
3,1661623200,"{'temp': 19.55, 'feels_like': 19.96, 'temp_min...","[{'id': 500, 'main': 'Rain', 'description': 'l...",{'all': 97},"{'speed': 3.9, 'deg': 320, 'gust': 7.8}",10000,0.92,{'3h': 1.62},{'pod': 'd'},2022-08-27 18:00:00
4,1661634000,"{'temp': 18.87, 'feels_like': 19.24, 'temp_min...","[{'id': 500, 'main': 'Rain', 'description': 'l...",{'all': 100},"{'speed': 3.4, 'deg': 316, 'gust': 7.18}",10000,0.68,{'3h': 1.02},{'pod': 'n'},2022-08-27 21:00:00


To fix this we need to flatten the nested list using the parameter `record_path`, and then select the other columns we'd like using the parameter `meta`. At the end of the json_normalize parameters we have `errors='ignore'`, which means that if ever there is a missing value in the JSON, a `NaN` gets inserted into the table. If you'd like more information, please check out the [docs](https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html)

In [18]:
json_norm_df_1 = pd.json_normalize(json['list'], record_path = ['weather'])
json_norm_df_1.head()

Unnamed: 0,id,main,description,icon
0,500,Rain,light rain,10d
1,501,Rain,moderate rain,10d
2,500,Rain,light rain,10d
3,500,Rain,light rain,10d
4,500,Rain,light rain,10n


In [19]:
json_norm_df = pd.json_normalize(json['list'], 
                                record_path=['weather'], 
                                meta=['dt_txt', ['main', 'temp'], ['main', 'feels_like'], 
                                     ['clouds', 'all'], ['rain', '3h'], ['snow', '3h'], ['wind', 'speed'], 
                                     ['wind', 'deg'], ['main', 'humidity'], ['main', 'pressure']],
                                errors='ignore')

json_norm_df.head()

Unnamed: 0,id,main,description,icon,dt_txt,main.temp,main.feels_like,clouds.all,rain.3h,snow.3h,wind.speed,wind.deg,main.humidity,main.pressure
0,500,Rain,light rain,10d,2022-08-27 09:00:00,20.06,20.58,100,0.37,,3.74,318,94,1009
1,501,Rain,moderate rain,10d,2022-08-27 12:00:00,21.12,21.56,96,4.61,,4.38,295,87,1010
2,500,Rain,light rain,10d,2022-08-27 15:00:00,21.39,21.73,97,2.37,,4.85,297,82,1011
3,500,Rain,light rain,10d,2022-08-27 18:00:00,19.55,19.96,97,1.62,,3.9,320,92,1012
4,500,Rain,light rain,10n,2022-08-27 21:00:00,18.87,19.24,100,1.02,,3.4,316,93,1013


Now let's tidy this DataFrame up by
- dropping some columns
- renaming some columns
- adding some columns

In [20]:
# drop id and icon
json_norm_df = json_norm_df.drop(columns=['id', 'icon']).copy()

In [21]:
# rename some columns
json_norm_df.rename(columns={'main': 'outlook',
                             'description': 'detailed_outlook',
                             'dt_txt': 'forecast_time',
                             'main.temp': 'temperature',
                             'main.feels_like': 'temperature_feels_like',
                             'clouds.all': 'clouds',
                             'rain.3h': 'rain',
                             'snow.3h': 'snow',
                             'wind.speed': 'wind_speed',
                             'wind.deg': 'wind_deg',
                             'main.humidity': 'humidity',
                             'main.pressure': 'pressure',},
                    inplace=True)

In [22]:
# add some columns
json_norm_df.insert(0, 'city', json['city']['name'])
json_norm_df.insert(1, 'country', json['city']['country'])
json_norm_df['information_retrieved_at'] = now.strftime("%d/%m/%Y %H:%M:%S")

# rearrange the columns
json_norm_df = json_norm_df[['city', 'country', 'forecast_time', 'outlook', 'detailed_outlook',
       'temperature', 'temperature_feels_like', 'clouds', 'rain', 'snow',
       'wind_speed', 'wind_deg', 'humidity', 'pressure',
       'information_retrieved_at']]

json_norm_df.head()

Unnamed: 0,city,country,forecast_time,outlook,detailed_outlook,temperature,temperature_feels_like,clouds,rain,snow,wind_speed,wind_deg,humidity,pressure,information_retrieved_at
0,Berlin,DE,2022-08-27 09:00:00,Rain,light rain,20.06,20.58,100,0.37,,3.74,318,94,1009,27/08/2022 10:39:37
1,Berlin,DE,2022-08-27 12:00:00,Rain,moderate rain,21.12,21.56,96,4.61,,4.38,295,87,1010,27/08/2022 10:39:37
2,Berlin,DE,2022-08-27 15:00:00,Rain,light rain,21.39,21.73,97,2.37,,4.85,297,82,1011,27/08/2022 10:39:37
3,Berlin,DE,2022-08-27 18:00:00,Rain,light rain,19.55,19.96,97,1.62,,3.9,320,92,1012,27/08/2022 10:39:37
4,Berlin,DE,2022-08-27 21:00:00,Rain,light rain,18.87,19.24,100,1.02,,3.4,316,93,1013,27/08/2022 10:39:37


Now we have a very similar DataFrame to what we had with the for loop. Again, let's put this in a function that can take a list of cities

In [23]:
def get_weather_norm(cities):
  
  API_key = '4fe53ee5e34a7d900ed58bd74bbbb0b7'
  df_list = []

  for city in cities:
    url = (f"http://api.openweathermap.org/data/2.5/forecast?q={city}&appid={API_key}&units=metric") 
    response = requests.get(url)
    json = response.json()

    json_norm_df = pd.json_normalize(json['list'], 
                                record_path=['weather'], 
                                meta=['dt_txt', ['main', 'temp'], ['main', 'feels_like'], ['clouds', 'all'], ['rain', '3h'], ['snow', '3h'], ['wind', 'speed'], ['wind', 'deg'], ['main', 'humidity'], ['main', 'pressure']], 
                                errors='ignore')
    json_norm_df.drop(columns=['id', 'icon'], inplace=True)
    json_norm_df.rename(columns={'main': 'outlook',
                             'description': 'detailed_outlook',
                             'dt_txt': 'forecast_time',
                             'main.temp': 'temperature',
                             'main.feels_like': 'temperature_feels_like',
                             'clouds.all': 'clouds',
                             'rain.3h': 'rain',
                             'snow.3h': 'snow',
                             'wind.speed': 'wind_speed',
                             'wind.deg': 'wind_deg',
                             'main.humidity': 'humidity',
                             'main.pressure': 'pressure',},
                    inplace=True)
    json_norm_df.insert(0, 'city', json['city']['name'])
    json_norm_df.insert(1, 'country', json['city']['country'])
    json_norm_df['information_retrieved_at'] = now.strftime("%d/%m/%Y %H:%M:%S")
    json_norm_df[['city', 'country', 'forecast_time', 'outlook', 'detailed_outlook',
          'temperature', 'temperature_feels_like', 'clouds', 'rain', 'snow',
          'wind_speed', 'wind_deg', 'humidity', 'pressure',
          'information_retrieved_at']]
    df_list.append(json_norm_df)
  return pd.concat(df_list, ignore_index=True)

In [24]:
get_weather_norm(['Berlin', 'London'])

Unnamed: 0,city,country,outlook,detailed_outlook,forecast_time,temperature,temperature_feels_like,clouds,rain,snow,wind_speed,wind_deg,humidity,pressure,information_retrieved_at
0,Berlin,DE,Rain,light rain,2022-08-27 09:00:00,20.61,21.13,100,0.37,,3.74,318,92,1008,27/08/2022 10:39:37
1,Berlin,DE,Rain,moderate rain,2022-08-27 12:00:00,21.49,21.94,96,4.61,,4.38,295,86,1009,27/08/2022 10:39:37
2,Berlin,DE,Rain,light rain,2022-08-27 15:00:00,21.57,21.92,97,2.37,,4.85,297,82,1011,27/08/2022 10:39:37
3,Berlin,DE,Rain,light rain,2022-08-27 18:00:00,19.55,19.96,97,1.62,,3.9,320,92,1012,27/08/2022 10:39:37
4,Berlin,DE,Rain,light rain,2022-08-27 21:00:00,18.87,19.24,100,1.02,,3.4,316,93,1013,27/08/2022 10:39:37
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75,London,GB,Clouds,overcast clouds,2022-08-31 18:00:00,20.37,19.82,89,,,4.43,58,52,1021,27/08/2022 10:39:37
76,London,GB,Clear,clear sky,2022-08-31 21:00:00,17.27,16.7,3,,,4.88,43,63,1021,27/08/2022 10:39:37
77,London,GB,Clear,clear sky,2022-09-01 00:00:00,15.87,15.52,2,,,4.36,34,77,1020,27/08/2022 10:39:37
78,London,GB,Clear,clear sky,2022-09-01 03:00:00,15.57,15.35,1,,,4.23,32,83,1019,27/08/2022 10:39:37
