# Working with Data from the Web

## Grading instructions

1. Launch VS Code and open your working-folder
2. Create a `Session_02` folder, in which you create another folder called `data`
3. Copy paste this notebook `02_Data_from_the_Web_lecture` from the lectures repo to the working-folder. 
4. Copy the json file `animals.json` and the zip file `arbeitsmarktstatistik_erwerbslosenquoten_geschlecht.zip` into the `working-folder/Session_02/data` directory.

#### There are two `Self-work exercises` in this notebook
1. During the course, do the self-work exercises
2. Once finished, copy-Paste this notebook `02_Data_from_the_Web_lecture` into `ESMT_2024_DataScraping_Students` folder in your computer
3. Commit and push your self-work in your branch before the deadline. **Push only the notebook, not the files!**

#### Number of points: 10 (weights 10% in the final grade)
- `Self-work exercises #1`: 4 points
- `Self-work exercises #2`: 6 points



#### Deadline: October 18th 08:59 am CET
#### Any missed deadline without justification to the Administration will result in 0 points for this homework.
#### If the Github branch is not correctly named using the indicated format **LASTNAME_firstname**, then a penalty of -2 points will be applied

## Course content

* Introduction to APIs
* Using request package to download files
* Loading files and tables from URLs (wikipedia)
* Working with zip files in Python
* Introduction to JSON (read_json, json_normalise)

# Introduction to API

An API, short for Application Programming Interface, is a concept used to describe – essentially – a piece of intermediary software (the interface) that facilitates communication between 2 other pieces of software (the applications). 

This very broad term is frequently used for web-based systems, database systems, operating systems, or even computer hardware. 

In this chapter we will focus on web-based APIs.

### What is a Web API?

A Web API typically means some kind of special website or URL that we use as a channel to get data from some company or web based program. 

We can write a Python program to retrieve data from the API. Put very bluntly, an API is a website providing data that is easy for a machine (e.g. python code) to understand (as opposed to a prettier, HTML-rendered, user interface for humans).

**Intro to API (duration: 3'24):**
https://www.youtube.com/watch?v=s7wmiS2mSXY

### Examples of Web APIs
* Google Maps: get map coordinates for an address
* Spotify: read and modify a playlist
* GitHub: read statistics on your code repo
* WeatherAPI: get weather data for specific location
* Google Translate: translate texts directly from a Python script

<br>

# Python Requests Library

The Python requests library is a popular third-party library that simplifies the process of making HTTP requests and working with HTTP responses. 

It provides a high-level interface for sending HTTP requests to web servers and receiving their responses. 

This library is widely used for tasks like fetching data from APIs, sending data to servers, and interacting with web resources.

    Installation: You can install the requests library using pip, a package installer for Python:

In [1]:
!pip install requests==2.32.3



    Importing: After installation, you need to import the library in your Python code before you can use it:

In [3]:
import requests

    HTTP Methods: The library supports various HTTP methods, such as GET, POST, PUT, DELETE, etc. These correspond to create, read, update, and delete (or CRUD) operations, respectively. You can choose the appropriate method for your request. 

To make a GET request:

In [5]:
response = requests.get("https://randomuser.me/api/")

## API status code

In [13]:
print(response.status_code)

200


    API Status code:

Status codes are returned with every request that is made to a web server. Status codes indicate information about what happened with a request. Here are some codes that are relevant to GET requests:

    200: Everything went okay, and the result has been returned (if any).
    301: The server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.
    400: The server thinks you made a bad request. This can happen when you don’t send along the right data, among other things.
    401: The server thinks you’re not authenticated. Many APIs require login credentials, so this happens when you don’t send the right credentials to access an API.
    403: The resource you’re trying to access is forbidden: you don’t have the right permissions to see it.
    404: The resource you tried to access wasn’t found on the server.
    503: The server is not ready to handle the request.

For more information about the various HTTP status codes [click here](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)

## Response Object


    Response Object: When you send a request using the requests library, you receive a response object that contains information about the server’s response:

   * response.status_code: HTTP status code of the response (200 is a code meaning a successful operation)
* response.content: Raw content of the response
* response.text: Content of the response in text format
* response.json(): Parses the response content as JSON if applicable
* response.headers: Headers received from the server (additional information passed through both request and response. It could be for example List of acceptable encodings. It is usually hidden to end-user)
* response.url: The URL that was accessed


In [15]:
response.json()

{'results': [{'gender': 'male',
   'name': {'title': 'Mr', 'first': 'William', 'last': 'Walker'},
   'location': {'street': {'number': 102, 'name': 'Cedar St'},
    'city': 'Hampton',
    'state': 'Nova Scotia',
    'country': 'Canada',
    'postcode': 'E4F 0M5',
    'coordinates': {'latitude': '-68.7472', 'longitude': '-67.4390'},
    'timezone': {'offset': '+7:00', 'description': 'Bangkok, Hanoi, Jakarta'}},
   'email': 'william.walker@example.com',
   'login': {'uuid': 'fec0368c-9cf2-47ad-b09a-a2c9aa785962',
    'username': 'organicmouse713',
    'password': '7grout',
    'salt': 'JX9lCP27',
    'md5': 'a57f7b5fbe93bfd77704ad4cf0738e43',
    'sha1': '6ccfd2fe51ebb2eeea16d92c8c835d4166a825df',
    'sha256': '5808524ab250b17be8f29cbc28ccb746a64a1b03f63835c9958e06cfa0e1e069'},
   'dob': {'date': '1953-11-17T01:12:18.286Z', 'age': 70},
   'registered': {'date': '2018-05-14T20:20:21.523Z', 'age': 6},
   'phone': 'R64 N35-4112',
   'cell': 'Q75 M21-1222',
   'id': {'name': 'SIN', 'value':

## Query String Parameters

One common way to customize a GET request is to pass values through query string parameters in the URL. 

To do this using get(), you pass data to params.

### Weather API:
1. Sign up to RapidAPI: https://rapidapi.com/signup
2. Select the free plan: https://rapidapi.com/meteostat/api/meteostat/pricing 

It allows you to 500 requests per months, with 3 requests per second maximum.

3. Go to this page and click on `Get Daily Station Data` on the left pane: https://rapidapi.com/meteostat/api/meteostat

4. On the right part `Code snippets` should appear the `x-rapidapi-key`: copy paste it below

5. Complete the information below to get the weather of Berlin today

### Important note: do not push this notebook to Github! It contains your API Key, which you don't want to reveal. If you were to push this content to github, delete your API key before!


In [39]:
# Use double quotes to assign your API key to private_api_key variable as a string
private_api_key = 8

In [35]:
# Define your parameters as a dictionary
params = {
    "lat":  52.520008,  # Find Berlin's latitude and add it here
    "lon": 13.404954,  # Find Berlin's longitude and add it here
    "start": "2023-10-09",  # Replace with today's date
    "end": "2023-10-09"  # Replace with today's date
}

response = requests.get("https://meteostat.p.rapidapi.com/point/daily",
                       params=params,
                       headers={
                           "X-RapidAPI-Host": "meteostat.p.rapidapi.com",
                           "X-RapidAPI-Key":  private_api_key# Add the private_api_key variable 
                       })

# Do not push your API key to Github!

In [37]:
response.json()

{'meta': {'generated': '2024-10-08 07:49:25',
  'stations': ['10384', '10381', 'D0400', '10385']},
 'data': [{'date': '2023-10-09',
   'tavg': 10.2,
   'tmin': 8.9,
   'tmax': 11.7,
   'prcp': 6.8,
   'snow': 0.0,
   'wdir': 127.0,
   'wspd': 8.3,
   'wpgt': 20.5,
   'pres': 1019.0,
   'tsun': 0}]}

<br>
<br>
<br>
<br><br><br><br><br><br>

# Data from Wikipedia

## Installation

First, install and import the package

*Note: all information about the packages and their versions can be found on [Pypi](https://pypi.org/), for example the [wikipedia package](https://pypi.org/project/wikipedia/)*

In [41]:
!pip install wikipedia==1.4.0

Collecting wikipedia==1.4.0
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py): started
  Building wheel for wikipedia (setup.py): finished with status 'done'
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11706 sha256=0c0ecb310fb2f33aabd96391d176d90a2d117342004f71c2960bca8e378d36e1
  Stored in directory: c:\users\taylor\appdata\local\pip\cache\wheels\63\47\7c\a9688349aa74d228ce0a9023229c6c0ac52ca2a40fe87679b8
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


In [43]:
# Import the package
import wikipedia

## First commands
* wikipedia.summary(<>): provides a summary of the page
* wikipedia.page(<>).content: provides the content
* wikipedia.page(<>).url: provides the URL of the page
* wikipedia.set_lang(<>): changes the request language

### Try the following commands:

In [67]:
wikipedia.summary("Neon Genesis Evangelion")

"Neon Genesis Evangelion (Japanese: 新世紀エヴァンゲリオン, Hepburn: Shinseiki Evangerion, lit.\u2009'New Century Evangelion' in Japanese and lit.\u2009'New Beginning Gospel' in Greek), also known as Evangelion or Eva, is a Japanese mecha anime television series produced by Gainax, animated by Tatsunoko, and directed by Hideaki Anno. It was broadcast on TV Tokyo from October 1995 to March 1996. \nEvangelion is set 15 years after a worldwide cataclysm in the futuristic fortified city of Tokyo-3. The protagonist is Shinji Ikari, a teenage boy recruited by his father Gendo to the mysterious organization Nerv. Shinji must pilot an Evangelion, a giant biomechanical mecha, to fight beings known as Angels.\nThe series explores the experiences and emotions of the Evangelion pilots and Nerv members as they battle Angels. They are called upon to understand the ultimate cause of events and the motives behind human action. The series has been described as a deconstruction of the mecha genre, and features arc

In [61]:
wikipedia.summary("Gurren Lagann")

'Gurren Lagann, known in Japan as Tengen Toppa Gurren Lagann (Japanese: 天元突破グレンラガン, Hepburn: Tengen Toppa Guren Ragan, lit. "Heaven-Piercing Gurren Lagann"), is a Japanese mecha anime television series animated by Gainax and co-produced by Aniplex and Konami. It ran for 27 episodes on TV Tokyo between April and September 2007. It was directed by Hiroyuki Imaishi and written by veteran playwright Kazuki Nakashima. Gurren Lagann takes place in a fictional future where the Spiral King, Lordgenome, rules Earth and forces mankind to live in isolated subterranean villages. The plot focuses on two teenagers, Simon and Kamina, who live in a subterranean village and wish to go to the surface. Using a mecha known as Lagann, they reach the surface and start fighting alongside other humans against Lordgenome\'s forces before fighting the forces of their true enemy.\nIn North America, although initially announced to be licensed by ADV Films in 2007, the license was transferred to Bandai Entertainme

In [65]:
wikipedia.page("Gurren Lagann").content

'Gurren Lagann, known in Japan as Tengen Toppa Gurren Lagann (Japanese: 天元突破グレンラガン, Hepburn: Tengen Toppa Guren Ragan, lit. "Heaven-Piercing Gurren Lagann"), is a Japanese mecha anime television series animated by Gainax and co-produced by Aniplex and Konami. It ran for 27 episodes on TV Tokyo between April and September 2007. It was directed by Hiroyuki Imaishi and written by veteran playwright Kazuki Nakashima. Gurren Lagann takes place in a fictional future where the Spiral King, Lordgenome, rules Earth and forces mankind to live in isolated subterranean villages. The plot focuses on two teenagers, Simon and Kamina, who live in a subterranean village and wish to go to the surface. Using a mecha known as Lagann, they reach the surface and start fighting alongside other humans against Lordgenome\'s forces before fighting the forces of their true enemy.\nIn North America, although initially announced to be licensed by ADV Films in 2007, the license was transferred to Bandai Entertainme

In [45]:
wikipedia.summary("FIVB Volleyball Women World Cup")

"The FIVB Volleyball Women's World Cup is an international volleyball competition contested by the senior women's national teams of the members of Fédération Internationale de Volleyball (FIVB), the sport's global governing body. Initially the tournament was played in the year following the Olympic Games, but since 1991 the World Cup has been awarded in the year preceding the Olympic Games. The current champion is China, which won its fifth title at the 2019 tournament.\nThe historical format of the competition involves 12 teams, including the automatically qualifying host nation Japan, competing in the tournament phase for the title at venues within the host nation over a period of about two weeks. The World Cup (with exception of the 2019 edition) acts as the first qualification event for the following year's Olympic Games with the top two teams qualifying.\nThe 14 World Cup tournaments have been won by six different national teams. China have won five times. The other World Cup winn

In [52]:
wikipedia.page("FIVB Volleyball Women World Cup").content

'The FIVB Volleyball Women\'s World Cup is an international volleyball competition contested by the senior women\'s national teams of the members of Fédération Internationale de Volleyball (FIVB), the sport\'s global governing body. Initially the tournament was played in the year following the Olympic Games, but since 1991 the World Cup has been awarded in the year preceding the Olympic Games. The current champion is China, which won its fifth title at the 2019 tournament.\nThe historical format of the competition involves 12 teams, including the automatically qualifying host nation Japan, competing in the tournament phase for the title at venues within the host nation over a period of about two weeks. The World Cup (with exception of the 2019 edition) acts as the first qualification event for the following year\'s Olympic Games with the top two teams qualifying.\nThe 14 World Cup tournaments have been won by six different national teams. China have won five times. The other World Cup 

In [91]:
wikipedia.page("FIVB Volleyball Women World Cup").url

'https://en.wikipedia.org/wiki/FIVB_Volleyball_Women%27s_World_Cup'

In [None]:
wikipedia.set_lang('de')
wikipedia.summary("FIVB Volleyball Women World Cup")

## Extract tabular data from wikipedia into a Pandas dataframe

In [49]:
# First, import the pandas package
import pandas as pd

In [51]:
# You may need to pip install lxml package
!pip install lxml



In [69]:
# Import the lxml package
import lxml

In [71]:
# Extract tabular data from wikipedia into a Pandas dataframe.
# We will use pandas.read_html

tables = pd.read_html("https://en.wikipedia.org/wiki/FIVB_Volleyball_Women%27s_World_Cup#Results_summary",
                     match='Champions')

In [73]:
# Show the tables variable
tables

[            Year     Host    Unnamed: 2      Champions     Runners-up  \
 0            NaN      NaN           NaN            NaN            NaN   
 1            NaN      NaN           NaN            NaN            NaN   
 2            NaN      NaN           NaN            NaN            NaN   
 3            NaN      NaN           NaN            NaN            NaN   
 4            NaN      NaN           NaN            NaN            NaN   
 5            NaN      NaN           NaN            NaN            NaN   
 6            NaN      NaN           NaN            NaN            NaN   
 7            NaN      NaN           NaN            NaN            NaN   
 8            NaN      NaN           NaN            NaN            NaN   
 9            NaN      NaN           NaN            NaN            NaN   
 10           NaN      NaN           NaN            NaN            NaN   
 11           NaN      NaN           NaN            NaN            NaN   
 12           NaN      NaN           N

In [93]:
# How big is tables?
len(tables)

11

In [95]:
# Let's get only the first result 
results_df = tables[0]

In [97]:
# Display the content of results_df
results_df

Unnamed: 0,Year,Host,Unnamed: 2,Champions,Runners-up,3rd place,4th place,Unnamed: 7,Teams
0,,,,,,,,,
1,,,,,,,,,
2,,,,,,,,,
3,,,,,,,,,
4,,,,,,,,,
5,,,,,,,,,
6,,,,,,,,,
7,,,,,,,,,
8,,,,,,,,,
9,,,,,,,,,


In [105]:
# Let' clean the data:
## Remove all rows that contain only null values

clean_df = results_df.dropna(how='all')
clean_df

Unnamed: 0,Year,Host,Unnamed: 2,Champions,Runners-up,3rd place,4th place,Unnamed: 7,Teams
14,1973 Details,Uruguay,Soviet Union,Japan,South Korea,Peru,10.0,,
15,1977 Details,Japan,Japan,Cuba,South Korea,China,8.0,,
16,1981 Details,Japan,China,Japan,Soviet Union,United States,8.0,,
17,1985 Details,Japan,China,Cuba,Soviet Union,Japan,8.0,,
18,1989 Details,Japan,Cuba,Soviet Union,China,Japan,8.0,,
19,1991 Details,Japan,Cuba,China,Soviet Union,United States,12.0,,
20,1995 Details,Japan,Cuba,Brazil,China,Croatia,12.0,,
21,1999 Details,Japan,Cuba,Russia,Brazil,South Korea,12.0,,
22,2003 Details,Japan,China,Brazil,United States,Italy,12.0,,
23,2007 Details,Japan,Italy,Brazil,United States,Cuba,12.0,,


#### Jump to the `Self-Work Exercises #1` Section

# Read zip files 

In [107]:
unemployment_rates = pd.read_csv("./data/arbeitsmarktstatistik_erwerbslosenquoten_geschlecht.zip", 
                        sep=";",
                        encoding='latin-1')

# Note: erwerbslosenquoten means unemployment rate
# arbeitsmarktstatistik means Statistics about the work market

In [109]:
# Display the data
unemployment_rates

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,ILO-Arbeitsmarktstatistik
Datum,"Erwerbslosenquoten Männer insgesamt, in %","Erwerbslosenquoten Männer unter 25 Jahren, in %","Erwerbslosenquoten Männer ab 25 Jahren, in %","Erwerbslosenquoten Frauen insgesamt, in %","Erwerbslosenquoten Frauen unter 25 Jahren, in %","Erwerbslosenquoten Frauen ab 25 Jahren, in %"
01/03/2007,95,127,91,88,87,88
01/04/2007,83,136,77,90,106,87
01/05/2007,85,117,81,85,104,83
01/06/2007,80,125,74,87,129,82
...,...,...,...,...,...,...
01/04/2023,32,75,27,29,50,26
01/05/2023,31,58,28,25,44,23
01/06/2023,32,71,28,28,59,24
01/07/2023,32,69,28,28,71,23


<br><br>
# Introduction to JSON

In [111]:
# Import the json package
import json

In [113]:
# Creating student records
student1 = {
    "name": "Alice",
    "age": 15,
    "grade": 10,
    "subjects": ["Math", "Science", "History"],
    "city": "Augsburg"
}

student2 = {
    "name": "Bob",
    "age": 16,
    "grade": 11,
    "subjects": ["English", "Physics", "Geography"],
    "city": "Berlin"
}

student3 = {
    "name": "Carol",
    "age": 14,
    "grade": 9,
    "subjects": ["Art", "Music", "PE"],
    "city": "Cottbus"
}

In [115]:
# Creating a list of student records
student_database = [student1,student2,student3]

In [169]:
# Converting the student database to JSON format
json_data = json.dumps(student_database, indent=4)


With indent=4: Each level of the JSON structure will be indented by 4 spaces, making it easier to read and visually understand the nested structure.

In [173]:
# Printing the JSON data
print(json_data)

[
    {
        "name": "Alice",
        "age": 15,
        "grade": 10,
        "subjects": [
            "Math",
            "Science",
            "History"
        ],
        "city": "Augsburg"
    },
    {
        "name": "Bob",
        "age": 16,
        "grade": 11,
        "subjects": [
            "English",
            "Physics",
            "Geography"
        ],
        "city": "Berlin"
    },
    {
        "name": "Carol",
        "age": 14,
        "grade": 9,
        "subjects": [
            "Art",
            "Music",
            "PE"
        ],
        "city": "Cottbus"
    }
]


### Read a json file

First, copy and paste the `animals.json` file into the `data` folder of your working directory

### Load the file

In [121]:
## We first load the file
with open("./data/animals.json") as f:
    d = json.load(f)

print(d)

[{'name': 'Ace', 'animal': 'Alpaca', 'tricks': ['spit'], 'demographics': {'sex': 'male', 'age': 2}, 'owner': 'Alice', 'city': 'Augsburg'}, {'name': 'Biscuit', 'animal': 'Beagle', 'tricks': ['sit', 'hunt'], 'demographics': {'sex': 'female', 'age': 1}, 'owner': 'Bob', 'city': 'Berlin'}, {'name': 'Coco', 'animal': 'Chinchilla', 'tricks': ['explore', 'climb'], 'demographics': {'sex': 'male', 'age': 5}, 'owner': 'Carol', 'city': 'Cottbus'}]


### Normalise the loaded data

In [165]:
# To convert each element in "demographics" section into new columns,
# We prefer to use : pd.json_nomalize
animals_v1 = pd.json_normalize(d)

In [167]:
# Display animals_v1
animals_v1

Unnamed: 0,name,animal,tricks,owner,city,demographics.sex,demographics.age
0,Ace,Alpaca,[spit],Alice,Augsburg,male,2
1,Biscuit,Beagle,"[sit, hunt]",Bob,Berlin,female,1
2,Coco,Chinchilla,"[explore, climb]",Carol,Cottbus,male,5


### Read a json file as a dataframe directly

In [177]:
animals_v2 = pd.read_json('./data/animals.json')

In [179]:
# Display animals_v2
animals_v2

Unnamed: 0,name,animal,tricks,demographics,owner,city
0,Ace,Alpaca,[spit],"{'sex': 'male', 'age': 2}",Alice,Augsburg
1,Biscuit,Beagle,"[sit, hunt]","{'sex': 'female', 'age': 1}",Bob,Berlin
2,Coco,Chinchilla,"[explore, climb]","{'sex': 'male', 'age': 5}",Carol,Cottbus


### Self-work exercises #1: 4 points

#### What problems do you see with `clean_df`? Can you clean the DataFrame to obtain a clean one?

Write a series of code commands in order to clean the DataFrame:
- create a deep copy of the DataFrame `clean_df` and assign it to the variable `df`
- get the list of useful columns, without copy-pasting them (use code!)
- drop the columns that contain missing values
- rename the columns so that each column corresponds to the correct header
- transform the `Year` column to remove `Details`, which we don't need

*Tip: this might be useful https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html*

In [159]:
# Create a copy of the dataframe
df = clean_df.copy(True)
# Get the list of columns, and remove the weird "Unnamed" ones
df.columns
df.drop(['Unnamed: 7'], axis='columns',inplace=True)
# columns_list = for loop, if not c.sstartswith("Unanmed")

# Remove the columns that contain only null values
df.dropna(how='all', axis='columns',inplace=True)

# Give the right column names to the dataframe
df.rename(columns={'4th place':'Teams','3rd place':'4th Place','Runners-up':'3rd Place','Champions':'Runners-up','Unnamed: 2': 'Champions'},inplace=True)
#df.columns = columns_list

# Remove "Details" from the "Year" column
df.Year = df.Year.str.replace('Details', '')

# Display the content of the dataframe
df

Unnamed: 0,Year,Host,Champions,Runners-up,3rd Place,4th Place,Teams
14,1973,Uruguay,Soviet Union,Japan,South Korea,Peru,10.0
15,1977,Japan,Japan,Cuba,South Korea,China,8.0
16,1981,Japan,China,Japan,Soviet Union,United States,8.0
17,1985,Japan,China,Cuba,Soviet Union,Japan,8.0
18,1989,Japan,Cuba,Soviet Union,China,Japan,8.0
19,1991,Japan,Cuba,China,Soviet Union,United States,12.0
20,1995,Japan,Cuba,Brazil,China,Croatia,12.0
21,1999,Japan,Cuba,Russia,Brazil,South Korea,12.0
22,2003,Japan,China,Brazil,United States,Italy,12.0
23,2007,Japan,Italy,Brazil,United States,Cuba,12.0


### Self-work exercises #2: 6 points

#### Use the python commands to answer the following questions

**1. What type is the resulting data (the loaded json: d) ?**

In [185]:
type(d)

list

**2. How many items does the resulting data have (the loaded json: d) ?**

In [187]:
len(d)

3

**3. What is the type of each item in the resulting data (the loaded json: d) ?**

In [191]:
for item in d:
    print(type(item))

<class 'dict'>
<class 'dict'>
<class 'dict'>


**4. How many tricks does Coco know ? Use the code to output it**

In [233]:
#len(d[tricks])
d[2]['tricks']




['explore', 'climb']

**5. How can you change the column name to be demographics_sex and demographics_age instead of demographics.sex and demographics.age?**

Save it into animals_v3 variable

*Tip: research online the pandas.json_normalize method*

In [239]:
animals_v3 = pd.json_normalize(d,sep='_')
animals_v3

Unnamed: 0,name,animal,tricks,owner,city,demographics_sex,demographics_age
0,Ace,Alpaca,[spit],Alice,Augsburg,male,2
1,Biscuit,Beagle,"[sit, hunt]",Bob,Berlin,female,1
2,Coco,Chinchilla,"[explore, climb]",Carol,Cottbus,male,5


**6. Following the examples of students_database, create the teachers_database**

Hints:
- the teacher is very wise, therefore they are at least 100 years old!
- Selma's city starts with S and is located in Germany
- The subject is the course you are taking

In [247]:
### Complete the following program:

teacher = {
    "name": "Selma",
    "age": 100,
    "grade": "Teacher",
    "subjects": "Data Scraping",
    "city": "Schönstedt"
}



teacher_database = [teacher]
teacher_data = json.dumps(teacher_database)
teachers_df = pd.read_json(teacher_data)

  teachers_df = pd.read_json(teacher_data)


In [249]:
teachers_df

Unnamed: 0,name,age,grade,subjects,city
0,Selma,100,Teacher,Data Scraping,Schönstedt
