# Introduction to APIs

## 1.&nbsp;Recap: lists and dictionaries

### 1.1&nbsp;Lists

A list starts and ends with square brackets `[]` and contains elements that are separated with a comma `,`.

Here are a few examples:

In [3]:
# Empty list.
empty_list = []
empty_list

[]

In [4]:
# List containing numbers.
list_with_numbers = [2, 64, 3.5, 7392, -5]
list_with_numbers

[2, 64, 3.5, 7392, -5]

In [5]:
# List containing other lists as elements.
list_of_lists = [[1, 2, 3], ["apples", 280], [-1, True, 4.5]]
list_of_lists

[[1, 2, 3], ['apples', 280], [-1, True, 4.5]]

You can access the individual elements inside the list by using their index.

In [6]:
# Get the first element from the list_with_numbers.
list_with_numbers[0]

2

In [7]:
# Get the second element from the list_of_lists.
list_of_lists[1]

['apples', 280]

In [8]:
# Get the first element from the second list inside the list of lists.
list_of_lists[1][0]

'apples'

### 1.2&nbsp;Dictionaries

A dictionary starts and ends with curly brackets `{}` and contains key-value pairs which are separated by a comma `,`.

Here are a few examples:

In [9]:
# Empty dictionary.
empty_dictionary = {}
empty_dictionary

{}

In [10]:
# Simple dictionary.
simple_dictionary = {"surname": "Duck", "name": "Donald"}
simple_dictionary

{'surname': 'Duck', 'name': 'Donald'}

In [11]:
# Nested dictionary that contains another dictionary as one of the values.
nested_dictionary = {"surname": "Holmes", "name": "Sherlock", "address": {"street": "Baker Street 221b", "city": "London"}}
nested_dictionary

{'surname': 'Holmes',
 'name': 'Sherlock',
 'address': {'street': 'Baker Street 221b', 'city': 'London'}}

You can get a list with all the keys of the dictionary using `.keys()`.

In [12]:
# Show the keys inside the dictionary.
nested_dictionary.keys()

dict_keys(['surname', 'name', 'address'])

You can access the individual values inside the dictionary by using the name of the key.

In [13]:
# Get the value of the key "name".
simple_dictionary["name"]

'Donald'

In [14]:
# This also works with nested dictionaries.
nested_dictionary["address"]

{'street': 'Baker Street 221b', 'city': 'London'}

In [15]:
# Combinations are also possible.
nested_dictionary["address"].keys()

dict_keys(['street', 'city'])

In [16]:
nested_dictionary["address"]["street"]

'Baker Street 221b'

## 2.&nbsp;Requests

The requests library is the de facto standard for making HTTP requests in Python. It abstracts the complexities of making requests behind a beautiful, simple API so that you can focus on interacting with services and consuming data in your application.

https://requests.readthedocs.io/en/latest/

When we make a request, we get a `requests.Response` object in return. The `requests.Response` object contains the server's response to the HTTP request. The response itself consists of several parts such as the content, the text or the status code.

The status code is a number that tells us whether we received the information we wanted or not. If you get a number that you don't understand, these cats will help you: https://http.cat/.

More often than not you'll receive one of these status codes:

- `200`: Success!

- `401`: Unauthorized client error status: lack of valid authentication credentials

- `403`: The server understood the request but refuses to authorize it

In [17]:
import pandas as pd
import requests

Make a request to the website of The New York Times.

In [18]:
# The response from the website is stored inside the variable 'nyt'.
nyt = requests.get("https://www.nytimes.com/")
nyt

<Response [200]>

In [19]:
# Get the status code from the response.
nyt.status_code

200

In [20]:
# Get the text from the response.
nyt.text

'<!DOCTYPE html>\n<html lang="en" class=" nytapp-vi-homepage"  xmlns:og="http://opengraphprotocol.org/schema/">\n  <head>\n    <meta charset="utf-8" />\n    <title data-rh="true">The New York Times - Breaking News, US News, World News and Videos</title>\n    <meta data-rh="true" name="description" content="Live news, investigations, opinion, photos and video by the journalists of The New York Times from more than 150 countries around the world. Subscribe for coverage of U.S. and international news, politics, business, technology, science, health, arts, sports and more."/><meta data-rh="true" property="og:url" content="https://www.nytimes.com"/><meta data-rh="true" property="og:type" content="website"/><meta data-rh="true" property="og:title" content="The New York Times - Breaking News, US News, World News and Videos"/><meta data-rh="true" property="og:description" content="Live news, investigations, opinion, photos and video by the journalists of The New York Times from more than 150 c

Make a request to the website of WBS Coding School.

In [21]:
# The response from the website is stored inside the variable 'wbscs'.
wbscs = requests.get("https://www.wbscodingschool.com/")
wbscs

<Response [200]>

In [22]:
wbscs.text

'<!doctype html><html lang="en-US" prefix="og: https://ogp.me/ns#"><head><meta charset="UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1"><link rel="profile" href="https://gmpg.org/xfn/11"><link rel="alternate" hreflang="en" href="https://www.wbscodingschool.com/" /><link rel="alternate" hreflang="de" href="https://www.wbscodingschool.com/de/" /><link rel="alternate" hreflang="en-es" href="https://www.wbscodingschool.com/es_en/" /><link rel="alternate" hreflang="en-gb" href="https://www.wbscodingschool.com/gb/" /><link rel="alternate" hreflang="x-default" href="https://www.wbscodingschool.com/" /><style type="text/css" media="all">img.wp-smiley,img.emoji{display:inline !important;border:0 !important;box-shadow:none !important;height:1em !important;width:1em !important;margin:0 .07em !important;vertical-align:-.1em !important;background:none !important;padding:0 !important}</style><style type="text/css" media="all">.wp-block-rank-math-toc-block nav ol{counter-re

## 3.&nbsp;JSON

### 3.1&nbsp;Viewing the JSON

https://docs.python.org/3/library/json.html

Since its inception, JSON has quickly become the de facto standard for information exchange. JSON supports primitive types, like strings and numbers, as well as nested lists and objects. It looks like nested python dictionaries:

```
{"firstname": "Harry",
 "lastname": "Noah",
 "city": "Berlin",
 "dogs": [{"name": "rover",
   "breed": "labrador"},
   {"name": "pip",
   "breed": "spaniel"}],
 "cars": "none"}
```

In [23]:
import json

Send a `get` request to the International Space Station API to know where the ISS is right now.

You can find the documentation here: http://open-notify.org/Open-Notify-API/ISS-Location-Now/.

In [24]:
# Make a request to the ISS API and store the response in `iss`.
iss = requests.get("http://api.open-notify.org/iss-now.json")

In [25]:
# Check the response.
iss

<Response [200]>

In [26]:
# Have a look at the text of the response.
iss.text

'{"iss_position": {"longitude": "153.0925", "latitude": "14.0827"}, "timestamp": 1709806195, "message": "success"}'

In [27]:
# Finally: view the response as a JSON.
iss.json()

{'iss_position': {'longitude': '153.0925', 'latitude': '14.0827'},
 'timestamp': 1709806195,
 'message': 'success'}

### 3.2&nbsp;Accessing the data in the JSON

Now that we know
- what an API is,
- how to request information from one (requests),
- how the information will be delivered to us (JSON):

Let's see how we can use this information. We will first look at how we can access particular values within the JSON.

The following API returns the current sunrise and sunset times for a specified location. You can check out the website with the documentation here: https://sunrisesunset.io/api/.

The structure of the url to the API is determined by the provider of the API and usually consists of several parts:

* A first part that leads to the specific API and is the same in every call.
  
  Example: `api.sunrise-sunset.org/json?`

* A mandatory part in which we define certain parameters in a pre-defined format.

  Example: the API requires both the latitude and the longitude of the location in decimal degrees `lat=36.7201600` and `lng=-4.4203400`.

* An optional part in which we can define more parameters in a pre-defined format.

  Example: the date can be specified in YYYY-MM-DD format `date=1990-05-22`, or in a relative format, such as “today” and “tomorrow”. If no date is specified, the default date will be today.

In [28]:
# Get the sunrise and sunset times for Berlin.
# First define the latitude, longitude and date.
latitude = 52.52
longitude = 13.24
date = "2023-01-01"

# Combine the different parts from above to create one url.
sun = requests.get(f"https://api.sunrise-sunset.org/json?lat={latitude}&lng={longitude}&date={date}")
sun_json = sun.json()
sun_json

{'results': {'sunrise': '7:15:34 AM',
  'sunset': '3:05:22 PM',
  'solar_noon': '11:10:28 AM',
  'day_length': '07:49:48',
  'civil_twilight_begin': '6:36:35 AM',
  'civil_twilight_end': '3:44:21 PM',
  'nautical_twilight_begin': '5:52:30 AM',
  'nautical_twilight_end': '4:28:26 PM',
  'astronomical_twilight_begin': '5:10:54 AM',
  'astronomical_twilight_end': '5:10:02 PM'},
 'status': 'OK',
 'tzid': 'UTC'}

Let's access a few values from the JSON.

Notice that the JSON is enclosed in curly brackets `{}`. This means that it is a dictionary and we can access the values by calling their respecitve keys.

In [29]:
# Which are the keys?
sun_json.keys()

dict_keys(['results', 'status', 'tzid'])

In [30]:
# What is the value of the results key?
sun_json["results"]

{'sunrise': '7:15:34 AM',
 'sunset': '3:05:22 PM',
 'solar_noon': '11:10:28 AM',
 'day_length': '07:49:48',
 'civil_twilight_begin': '6:36:35 AM',
 'civil_twilight_end': '3:44:21 PM',
 'nautical_twilight_begin': '5:52:30 AM',
 'nautical_twilight_end': '4:28:26 PM',
 'astronomical_twilight_begin': '5:10:54 AM',
 'astronomical_twilight_end': '5:10:02 PM'}

In [31]:
# What is the time of the sunrise?
sun_json["results"]["sunrise"]

'7:15:34 AM'

### 3.3&nbsp; Transforming a JSON into a DataFrame

#### 3.3.1&nbsp;Option 1: pd.DataFrame()

In [32]:
# Transform JSON into a DataFrame.
sun_df = pd.DataFrame(sun_json)
sun_df

Unnamed: 0,results,status,tzid
astronomical_twilight_begin,5:10:54 AM,OK,UTC
astronomical_twilight_end,5:10:02 PM,OK,UTC
civil_twilight_begin,6:36:35 AM,OK,UTC
civil_twilight_end,3:44:21 PM,OK,UTC
day_length,07:49:48,OK,UTC
nautical_twilight_begin,5:52:30 AM,OK,UTC
nautical_twilight_end,4:28:26 PM,OK,UTC
solar_noon,11:10:28 AM,OK,UTC
sunrise,7:15:34 AM,OK,UTC
sunset,3:05:22 PM,OK,UTC


This returns a DataFrame, and we could even set the indices using `.reset_index()`.

Still, we would prefer to have only a single row for a single day, with all the keys from the two different levels as column titles.

#### 3.3.2&nbsp;Option 2: pd.json_normalize()
Another way to handle nested dictionaries is by using json_normalize().

https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html

In [33]:
# Transform JSON into a DataFrame.
sun_norm = pd.json_normalize(sun_json)
sun_norm

Unnamed: 0,status,tzid,results.sunrise,results.sunset,results.solar_noon,results.day_length,results.civil_twilight_begin,results.civil_twilight_end,results.nautical_twilight_begin,results.nautical_twilight_end,results.astronomical_twilight_begin,results.astronomical_twilight_end
0,OK,UTC,7:15:34 AM,3:05:22 PM,11:10:28 AM,07:49:48,6:36:35 AM,3:44:21 PM,5:52:30 AM,4:28:26 PM,5:10:54 AM,5:10:02 PM


SUCCESS!

## 4.&nbsp; Creating DataFrames using for loops, lists and dictionaries



Often, there are several elements of the same type and structure inside one JSON. Let's have a look at an example and ask for the times for 3 days.

In [34]:
# Define the latitude, longitude and 3 days.
latitude = 52.52
longitude = 13.24
date = ["yesterday", "today", "tomorrow"]

In [35]:
# Get the JSON for the first value inside the date lists (= yesterday).
sun = requests.get(f"https://api.sunrise-sunset.org/json?lat={latitude}&lng={longitude}&date={date[0]}")
sun_json = sun.json()
sun_json

{'results': {'sunrise': '5:39:02 AM',
  'sunset': '4:57:17 PM',
  'solar_noon': '11:18:10 AM',
  'day_length': '11:18:15',
  'civil_twilight_begin': '5:06:44 AM',
  'civil_twilight_end': '5:29:35 PM',
  'nautical_twilight_begin': '4:27:13 AM',
  'nautical_twilight_end': '6:09:07 PM',
  'astronomical_twilight_begin': '3:47:00 AM',
  'astronomical_twilight_end': '6:49:20 PM'},
 'status': 'OK',
 'tzid': 'UTC'}

To get the times for all three days, we will use a for loop. It requests the data for yesterday first, then for today, then for tomorrow, and adds them as separate elements to a list.

The resulting list looks much like the JSON you get from APIs that provide the same type of data multiple times inside a single JSON. This step will therefore usually not be necessary because your initial JSON will already have a comparable structure.

If you like to review for loops and/or the range, here is some useful information:

- https://www.w3schools.com/python/python_for_loops.asp

- https://www.w3schools.com/python/ref_func_range.asp

In [36]:
# Initialize an empty list to store all the JSONs.
times = []
# Loop over the elements in the date list.
# Request data from the API for each of the date elements.
# Save the returned JSON in the time list.
for i in range(len(date)):
  times.append(requests.get(f"https://api.sunrise-sunset.org/json?lat={latitude}&lng={longitude}&date={date[i]}").json())

In [37]:
times

[{'results': {'sunrise': '5:39:02 AM',
   'sunset': '4:57:17 PM',
   'solar_noon': '11:18:10 AM',
   'day_length': '11:18:15',
   'civil_twilight_begin': '5:06:44 AM',
   'civil_twilight_end': '5:29:35 PM',
   'nautical_twilight_begin': '4:27:13 AM',
   'nautical_twilight_end': '6:09:07 PM',
   'astronomical_twilight_begin': '3:47:00 AM',
   'astronomical_twilight_end': '6:49:20 PM'},
  'status': 'OK',
  'tzid': 'UTC'},
 {'results': {'sunrise': '5:36:45 AM',
   'sunset': '4:59:06 PM',
   'solar_noon': '11:17:55 AM',
   'day_length': '11:22:21',
   'civil_twilight_begin': '5:04:28 AM',
   'civil_twilight_end': '5:31:22 PM',
   'nautical_twilight_begin': '4:24:55 AM',
   'nautical_twilight_end': '6:10:55 PM',
   'astronomical_twilight_begin': '3:44:37 AM',
   'astronomical_twilight_end': '6:51:14 PM'},
  'status': 'OK',
  'tzid': 'UTC'},
 {'results': {'sunrise': '5:34:27 AM',
   'sunset': '5:00:54 PM',
   'solar_noon': '11:17:41 AM',
   'day_length': '11:26:27',
   'civil_twilight_begin'

If we only want to select certain parts of the JSON:
- Option 1: transform the entire JSON into a DataFrame and drop the rest.
- Option 2: Use a for loop to extract only the required information. This is what we will look into as the next step.

### 4.1&nbsp;Write a for loop

Creating a for loop is usually an iterative process.

First, we select one value from our JSON that we would like to have in our final DataFrame. Let's start with yesterday's sunrise time.

>To access values inside nested lists and dictionaries, we need to apply the [Matryoshka doll](https://en.wikipedia.org/wiki/Matryoshka_doll) principle. This means that we will need to access the most outer dictionary/list first, then the next one inside, then the next etc.

In [38]:
# First select the element of the list that contains yesterday's data.
# Then select the keys that navigate you to the sunrise time.
times[0]["results"]["sunrise"]

'5:39:02 AM'

Using this code, it is easy to get the sunrise time of today and tomorrow, too. Just change the value to the index of the day you're interested in.

In [39]:
times[1]["results"]["sunrise"], times[2]["results"]["sunrise"]

('5:36:45 AM', '5:34:27 AM')

Now let's build a for loop with the code we have prepared:

In [40]:
# Create an empty list which will be filled with the sunrise times.
sunrise = []

# Iterate over the 3 dictionaries inside the times list.
for i, time in enumerate(times):
  # For each day, get the sunrise time and append it to the sunrise list.
  sunrise.append(time["results"]["sunrise"])

In [41]:
# Let's have a look at the sunrise list.
sunrise

['5:39:02 AM', '5:36:45 AM', '5:34:27 AM']

SUCCESS!

Once this loop works with one value, more can be added. Let's add the sunset time and the day length.

In [42]:
# Create an empty list for each of the values.
sunrise = []
sunset = []
day_length = []

# Iterate over the 3 dictionaries inside the times list.
for i, time in enumerate(times):
  # For each day, get the sunrise time and append it to the sunrise list.
  sunrise.append(time["results"]["sunrise"])
  # For each day, get the sunset time and append it to the sunset list.
  sunset.append(time["results"]["sunset"])
  # For each day, get the day length and append it to the day length list.
  day_length.append(time["results"]["day_length"])

Let's have a look at all these lists.

In [43]:
sunrise, sunset, day_length

(['5:39:02 AM', '5:36:45 AM', '5:34:27 AM'],
 ['4:57:17 PM', '4:59:06 PM', '5:00:54 PM'],
 ['11:18:15', '11:22:21', '11:26:27'])

### 4.2&nbsp;Create a DataFrame using lists

As a final step, we want to merge these lists into a DataFrame.

In [44]:
# Zip the different lists, and add the column labels at the end.
times_df = pd.DataFrame(list(zip(sunrise, sunset, day_length)), columns=["Sunrise", "Sunset", "Day_length"])
times_df

Unnamed: 0,Sunrise,Sunset,Day_length
0,5:39:02 AM,4:57:17 PM,11:18:15
1,5:36:45 AM,4:59:06 PM,11:22:21
2,5:34:27 AM,5:00:54 PM,11:26:27


### 4.3&nbsp;Create a DataFrame using dictionaries

Alternatively, a dataframe can be created from a dictionary, using the lists from the for loop above.

In [45]:
# Pass a dictionary inside pd.DataFrame().
# Its keys are the column labels.
# Its values are the lists containing the data to become the columns of your DataFrame.
# We can even easily add the list containing the dates.
times_df_dictionary = pd.DataFrame({"Day": date,
                                    "Sunrise": sunrise,
                                    "Sunset": sunset,
                                    "Day length": day_length}
                                   )
times_df_dictionary

Unnamed: 0,Day,Sunrise,Sunset,Day length
0,yesterday,5:39:02 AM,4:57:17 PM,11:18:15
1,today,5:36:45 AM,4:59:06 PM,11:22:21
2,tomorrow,5:34:27 AM,5:00:54 PM,11:26:27


Another option is to create the dictionary first, and then use the for loop to fill its values.

In [46]:
# Create an empty list for each of the values.
times_dictionary = {"Sunrise": [],
                    "Sunset": [],
                    "Day_length": []}

# Iterate over the 3 dictionaries inside the times list.
for i, time in enumerate(times):
  # For each day, get the sunrise time and append it to the sunrise list.
  times_dictionary["Sunrise"].append(time["results"]["sunrise"])
  # For each day, get the sunset time and append it to the sunset list.
  times_dictionary["Sunset"].append(time["results"]["sunset"])
  # For each day, get the day length and append it to the day length list.
  times_dictionary["Day_length"].append(time["results"]["day_length"])

times_dictionary

{'Sunrise': ['5:39:02 AM', '5:36:45 AM', '5:34:27 AM'],
 'Sunset': ['4:57:17 PM', '4:59:06 PM', '5:00:54 PM'],
 'Day_length': ['11:18:15', '11:22:21', '11:26:27']}

In [47]:
# Create a DataFrame from the dictionary.
times_dictionary_df = pd.DataFrame(times_dictionary)
times_dictionary_df

Unnamed: 0,Sunrise,Sunset,Day_length
0,5:39:02 AM,4:57:17 PM,11:18:15
1,5:36:45 AM,4:59:06 PM,11:22:21
2,5:34:27 AM,5:00:54 PM,11:26:27


## 5.&nbsp;APIs with keys

Some APIs restrict access to their data in the sense that users need to register beforehand. Upon registration, users receive an API key that is linked to their account. This has two main consequences:

* The url used to request the data needs to contain the API key.
* The API can track the number and type of requests made. This enables APIs to limit the number of requests per account within a certain timeframe.

Let's have a look at how this works with an API that provides useful statistics about many cities around the world: https://api-ninjas.com/api/city

There are a few parameters available to filter the cities returned. We will use these:
* name (optional) - name of city.
* country (optional) - country filter. Must be an ISO-3166 alpha-2 country code (e.g. US).
* min_population (optional) - minimum city population.
* limit (optional) - How many results to return. Must be between 1 and 30. Default is 1.

### 5.1&nbsp;Accessing the data

In [48]:
# Define the parameters to be passed into the url.
country = "DE"
min_population = "500000"
limit = "5"
API_key = "qClItv2fg4hGDiVLf6zTNw==rLE3xLfQa0uwZFD0"

# Reference the parameters in the url.
cities = requests.get(f"https://api.api-ninjas.com/v1/city?country={country}&min_population={min_population}&limit={limit}", headers={'X-Api-Key': API_key})
cities_json = cities.json()
cities_json

[{'name': 'Berlin',
  'latitude': 52.5167,
  'longitude': 13.3833,
  'country': 'DE',
  'population': 3644826,
  'is_capital': True},
 {'name': 'Hamburg',
  'latitude': 53.55,
  'longitude': 10.0,
  'country': 'DE',
  'population': 1841179,
  'is_capital': False},
 {'name': 'Munich',
  'latitude': 48.1372,
  'longitude': 11.5755,
  'country': 'DE',
  'population': 1471508,
  'is_capital': False},
 {'name': 'Cologne',
  'latitude': 50.9422,
  'longitude': 6.9578,
  'country': 'DE',
  'population': 1085664,
  'is_capital': False},
 {'name': 'Frankfurt',
  'latitude': 50.1136,
  'longitude': 8.6797,
  'country': 'DE',
  'population': 753056,
  'is_capital': False}]

In [49]:
# Define the sections that will together form the url.
url = "https://api.api-ninjas.com/v1/city"
header={"X-Api-Key": "qClItv2fg4hGDiVLf6zTNw==rLE3xLfQa0uwZFD0"}
querystring = {"country": "DE", "min_population": "500000", "limit": "5"}

# Reference the sections in the request.
cities = requests.request("GET", url, headers=header, params=querystring)
cities_json = cities.json()
cities_json

[{'name': 'Berlin',
  'latitude': 52.5167,
  'longitude': 13.3833,
  'country': 'DE',
  'population': 3644826,
  'is_capital': True},
 {'name': 'Hamburg',
  'latitude': 53.55,
  'longitude': 10.0,
  'country': 'DE',
  'population': 1841179,
  'is_capital': False},
 {'name': 'Munich',
  'latitude': 48.1372,
  'longitude': 11.5755,
  'country': 'DE',
  'population': 1471508,
  'is_capital': False},
 {'name': 'Cologne',
  'latitude': 50.9422,
  'longitude': 6.9578,
  'country': 'DE',
  'population': 1085664,
  'is_capital': False},
 {'name': 'Frankfurt',
  'latitude': 50.1136,
  'longitude': 8.6797,
  'country': 'DE',
  'population': 753056,
  'is_capital': False}]

###5.2&nbsp;Transforming the data

Create a dataframe that contains the information from the JSON.

In [50]:
cities_df = pd.DataFrame(cities_json)
cities_df

Unnamed: 0,name,latitude,longitude,country,population,is_capital
0,Berlin,52.5167,13.3833,DE,3644826,True
1,Hamburg,53.55,10.0,DE,1841179,False
2,Munich,48.1372,11.5755,DE,1471508,False
3,Cologne,50.9422,6.9578,DE,1085664,False
4,Frankfurt,50.1136,8.6797,DE,753056,False


Drop the column `is_capital`.

In [51]:
cities_df = cities_df.drop(["is_capital"], axis=1)

Change the data type of the column `population` to float.

In [52]:
# Check the data type of all columns first.
cities_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   name        5 non-null      object 
 1   latitude    5 non-null      float64
 2   longitude   5 non-null      float64
 3   country     5 non-null      object 
 4   population  5 non-null      int64  
dtypes: float64(2), int64(1), object(2)
memory usage: 328.0+ bytes


In [53]:
# Change the data type.
cities_df['population'] = pd.to_numeric(cities_df['population'], downcast='float')

In [54]:
# Check the data type of all columns again.
cities_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   name        5 non-null      object 
 1   latitude    5 non-null      float64
 2   longitude   5 non-null      float64
 3   country     5 non-null      object 
 4   population  5 non-null      float32
dtypes: float32(1), float64(2), object(2)
memory usage: 308.0+ bytes


Create a new column which combines the `city` and `country` columns. The values in the new column are unique. This is handy for potential later usage e.g. as a primary key in SQL.

In [55]:
cities_df["name_country"] = cities_df["name"] + ", " + cities_df["country"]

In [56]:
cities_df

Unnamed: 0,name,latitude,longitude,country,population,name_country
0,Berlin,52.5167,13.3833,DE,3644826.0,"Berlin, DE"
1,Hamburg,53.55,10.0,DE,1841179.0,"Hamburg, DE"
2,Munich,48.1372,11.5755,DE,1471508.0,"Munich, DE"
3,Cologne,50.9422,6.9578,DE,1085664.0,"Cologne, DE"
4,Frankfurt,50.1136,8.6797,DE,753056.0,"Frankfurt, DE"
