## Session 14 Homework 

### Using `json` files in Python

After learning the context manager, we can now use it to read and write `json` files alongside the `json` module.

#### 1. Reading `json` files

In Python we can use the `json` module to read and write `json` files. But first we need to understand what `json` is.

A JSON file is a file that stores simple data structures and objects in JavaScript Object Notation (JSON) format, which is a standard data interchange format. It is primarily used for transmitting data between a web application and a server. JSON files are lightweight, text-based, human-readable, and can be edited using a text editor.

We have to import the `json` module to use it.

```python
import json
```

Once we have it installed, we can use the `json.load()` function to read a `json` file.

```python
with open('data.json', 'r') as f:
    data = json.load(f)
```

In [2]:
import json

with open('spotify.json', 'r') as f:
    data = json.load(f)

If we take a peek at the file, we can see that in this case, the `json` file contains a list of dictionaries.

Since we learned that we can take a list of dictionaries and save it as a pandas dataframe, we can do the same with the `json` file.


In [4]:
import pandas as pd

df = pd.DataFrame(data)
df.head()

Unnamed: 0,endTime,artistName,trackName,msPlayed
0,2021-03-03 09:44,C. Tangana,Nunca Estoy,162493
1,2021-03-03 09:49,C. Tangana,Párteme La Cara,167866
2,2021-03-03 09:52,C. Tangana,Ingobernable,187053
3,2021-03-03 09:53,C. Tangana,Nominao,85674
4,2021-03-03 09:56,C. Tangana,Un Veneno - G-Mix,193693


This file contains 4 columns:

* `endTime` - the time the play ended
* `artistName` - the name of the artist
* `trackName` - the name of the track
* `msPlayed` - the length of the track in milliseconds

### Exercise 1

Using `groupby`, find the total number of milliseconds played by each artist, and order them from most to least played.

In [16]:
df.groupby('artistName')['msPlayed'].sum().sort_values(ascending=False)

artistName
Tame Impala            69056722
Nujabes                47705726
ZOO                    41212084
Los Chikos del Maiz    27667451
C. Tangana             25543931
                         ...   
One Direction                 0
Leona Lewis                   0
Morgan Wallen                 0
Usher                         0
Kesha                         0
Name: msPlayed, Length: 618, dtype: int64

### Exercise 2

Using `groupby`, find the total number of milliseconds played by each track, and order them from most to least played.

### Date and time in Python and pandas

Since we have a column that represents a datetime, we can take advantage of that and use that information to create new columns.

First, we need to convert the `endTime` column to a datetime object.

```python
df['endTime'] = pd.to_datetime(df['endTime'])
```



In [4]:
df['endTime'] = pd.to_datetime(df['endTime'])

From this object we can extract several pieces of information:

* `year`
* `month`
* `day`
* `hour`
* `minute`
* `weekday`
...

In [5]:
df['year'] = df['endTime'].dt.year
df['month'] = df['endTime'].dt.month
df['day'] = df['endTime'].dt.day
df['hour'] = df['endTime'].dt.hour
df['minute'] = df['endTime'].dt.minute
df['weekday'] = df['endTime'].dt.weekday

df.head()

Unnamed: 0,endTime,artistName,trackName,msPlayed,year,month,day,hour,minute,weekday
0,2021-03-03 09:44:00,C. Tangana,Nunca Estoy,162493,2021,3,3,9,44,2
1,2021-03-03 09:49:00,C. Tangana,Párteme La Cara,167866,2021,3,3,9,49,2
2,2021-03-03 09:52:00,C. Tangana,Ingobernable,187053,2021,3,3,9,52,2
3,2021-03-03 09:53:00,C. Tangana,Nominao,85674,2021,3,3,9,53,2
4,2021-03-03 09:56:00,C. Tangana,Un Veneno - G-Mix,193693,2021,3,3,9,56,2


### Exercise 3

Which is the most played artist in each month?

### Exercise 4

Which was the month in which I listened to the most music?

Which was the month in which I listened to the least music?

### Exercise 5

Is there an hour of the day in which I listen to more music? 

Can you explain my behavior during the day according to the data?