<font size="+5">#02 | Dissecting the Object with Pandas DataFrame Properties</font>

- Subscribe to my [Blog â†—](https://blog.pythonassembly.com/)
- Let's keep in touch on [LinkedIn â†—](www.linkedin.com/in/jsulopz) ðŸ˜„

## All that glitters is not gold

> - Not all objects can call the same functions
> - Even though they may store the same information

### `list`

> - Create a list with your math `grades`

In [1]:
lista_notas = [10, 7, 8]

> - Compute the mean

In [2]:
lista_notas.mean()

AttributeError: 'list' object has no attribute 'mean'

> - You cannot do the mean
> - But you could `sum()` and `len()`

In [3]:
sum(lista_notas)

25

In [4]:
len(lista_notas)

3

> - And divide the sum by the number of exams you did

In [5]:
sum(lista_notas)/len(lista_notas)

8.333333333333334

> - [ ] Isn't it an `object` that could calculate the `mean()`?

### `Series`

In [6]:
import pandas as pd

In [7]:
series_notas = pd.Series([10, 7, 8])

In [8]:
series_notas

0    10
1     7
2     8
dtype: int64

In [9]:
series_notas.mean()

8.333333333333334

In [10]:
lista_notas

[10, 7, 8]

In [11]:
lista_notas.mean()

AttributeError: 'list' object has no attribute 'mean'

In [12]:
type(notas_series)

NameError: name 'notas_series' is not defined

In [13]:
type(notas)

NameError: name 'notas' is not defined

> - Use the `.` + `[tab]` key to see the `functions/methods` of the object

In [14]:
notas.

SyntaxError: invalid syntax (3389551673.py, line 1)

In [15]:
series_notas.

SyntaxError: invalid syntax (1457235964.py, line 1)

## How to Access the `items` of an `Object`

### The `list`

> - Create a list of your best friends

In [16]:
lista_bf = ['maria', 'pepe', 'alberto']

> - Access the 2nd element `'pepe'` â†“

In [17]:
lista_bf[1]

'pepe'

### The `dict`

In [18]:
diccionario_bf = {'primera': 'maria', 'segundo': 'pepe', 'tercero': 'alberto'}

> - Access the 2nd element `'pepe'` â†“

In [19]:
diccionario_bf[1]

KeyError: 1

In [20]:
diccionario_bf.keys()

dict_keys(['primera', 'segundo', 'tercero'])

In [21]:
diccionario_bf['segundo']

'pepe'

## Store the information in Python `objects` for the Best Tennis Players

> - income
> - titles
> - grand slams
> - turned professional
> - wins
> - losses

### Create a `dictionary` for Roger Federer

In [22]:
roger = {'income': 130, 'titles': 103, 'grand slams': 20, 'turned professional': 1998, 'wins': 1251, 'losses': 275}

In [23]:
roger

{'income': 130,
 'titles': 103,
 'grand slams': 20,
 'turned professional': 1998,
 'wins': 1251,
 'losses': 275}

### Create a `dictionary` for Rafa Nadal

In [24]:
rafa = {'income': 127, 'titles': 90, 'grand slams': 21, 'turned professional': 2001, 'wins': 1038, 'losses': 209}

In [25]:
rafa

{'income': 127,
 'titles': 90,
 'grand slams': 21,
 'turned professional': 2001,
 'wins': 1038,
 'losses': 209}

### Create a `dictionary` for Novak Djokovic

In [26]:
nole = {'income': 154, 'titles': 86, 'grand slams': 20, 'turned professional': 2003, 'wins': 989, 'losses': 199}

In [27]:
nole

{'income': 154,
 'titles': 86,
 'grand slams': 20,
 'turned professional': 2003,
 'wins': 989,
 'losses': 199}

### How much wealth did all of them earned?

> - You may put all of them into a `list`
> - And `sum()` the `wealth`

In [28]:
lista_best_players = [roger, rafa, nole]

In [29]:
lista_best_players.sum()

AttributeError: 'list' object has no attribute 'sum'

> - The `sum()` is not an action
> - that a simple object `list` can perform
> - [ ] Could we convert the list into a
> - more powerful object
> - that could compute the `sum()`?

In [30]:
import pandas as pd

In [31]:
pd.DataFrame(lista_best_players)

Unnamed: 0,income,titles,grand slams,turned professional,wins,losses
0,130,103,20,1998,1251,275
1,127,90,21,2001,1038,209
2,154,86,20,2003,989,199


In [32]:
df_best_players = pd.DataFrame(lista_best_players, index=['roger', 'rafa', 'nole'])

In [33]:
df_best_players

Unnamed: 0,income,titles,grand slams,turned professional,wins,losses
roger,130,103,20,1998,1251,275
rafa,127,90,21,2001,1038,209
nole,154,86,20,2003,989,199


> - Access the `wealth` column
> - and compute the `sum()`

In [34]:
df_best_players['wealth']

KeyError: 'wealth'

In [35]:
df_best_players['wealth'].sum()

KeyError: 'wealth'

> - [ ] Which type of `object` is the table?

In [36]:
type(df_best_players)

pandas.core.frame.DataFrame

> - [ ] What else can we do with this `object`?

In [37]:
df_best_players.

SyntaxError: invalid syntax (4064718670.py, line 1)

## Can we select specific parts of the `DataFrame`?

> - [ ] names of rows

In [38]:
df_best_players.index

Index(['roger', 'rafa', 'nole'], dtype='object')

> - [ ] names of columns

In [39]:
df_best_players.columns

Index(['income', 'titles', 'grand slams', 'turned professional', 'wins',
       'losses'],
      dtype='object')

> - [ ] number of rows & columns

In [40]:
df_best_players.shape

(3, 6)

## Vocabulary Recap

1. object/
2. function/method
3. parameter/argument
4. library

## A practical case

> - Retrieve information from an url
> - and convert it into a DataFrame
> - to operate with the Data

### Retrieve the Information from an `url`

https://github.com/jsulopz/data

> - Find the `function()` that gets the content from an `url`

In [41]:
import requests

res = requests.get('https://raw.githubusercontent.com/jsulopz/data/main/best_tennis_players_stats.json')
res

<Response [200]>

> - Is the object just `<Response [200]>`
> - Or may it contain more information/data?

In [42]:
res.

SyntaxError: invalid syntax (1459325066.py, line 1)

> - How can you access the data we see [here](https://raw.githubusercontent.com/jsulopz/data/main/best_tennis_players_stats.json)

In [43]:
res.content

b'{"income":{"roger":130,"rafa":127,"nole":154},"titles":{"roger":103,"rafa":90,"nole":86},"grand slams":{"roger":20,"rafa":21,"nole":20},"turned professional":{"roger":1998,"rafa":2001,"nole":2003},"wins":{"roger":1251,"rafa":1038,"nole":989},"losses":{"roger":275,"rafa":209,"nole":199}}'

In [44]:
pd.DataFrame(res.content)

ValueError: DataFrame constructor not properly called!

In [45]:
pd.DataFrame('{"income":{"roger":130,"rafa":127,"nole":154},"titles":{"roger":103,"rafa":90,"nole":86},"grand slams":{"roger":20,"rafa":21,"nole":20},"turned professional":{"roger":1998,"rafa":2001,"nole":2003},"wins":{"roger":1251,"rafa":1038,"nole":989},"losses":{"roger":275,"rafa":209,"nole":199}}')

ValueError: DataFrame constructor not properly called!

In [46]:
pd.DataFrame('{"nombres": ["juan", "pepe"], "peso": [67,45]}')

ValueError: DataFrame constructor not properly called!

In [47]:
pd.DataFrame({"nombres": ["juan", "pepe"], "peso": [67,45]})

Unnamed: 0,nombres,peso
0,juan,67
1,pepe,45


> - Is there a way to get the data from the `url`
> - just like â†“

In [48]:
{"nombres": ["juan", "pepe"], "peso": [67,45]}

{'nombres': ['juan', 'pepe'], 'peso': [67, 45]}

> - and not this â†“

In [49]:
b'{"nombres": ["juan", "pepe"], "peso": [67,45]}'

b'{"nombres": ["juan", "pepe"], "peso": [67,45]}'

> - Apply the discipline to find a `function()` within the object

In [50]:
res.json()

{'income': {'roger': 130, 'rafa': 127, 'nole': 154},
 'titles': {'roger': 103, 'rafa': 90, 'nole': 86},
 'grand slams': {'roger': 20, 'rafa': 21, 'nole': 20},
 'turned professional': {'roger': 1998, 'rafa': 2001, 'nole': 2003},
 'wins': {'roger': 1251, 'rafa': 1038, 'nole': 989},
 'losses': {'roger': 275, 'rafa': 209, 'nole': 199}}

In [51]:
pd.DataFrame(res.json())

Unnamed: 0,income,titles,grand slams,turned professional,wins,losses
roger,130,103,20,1998,1251,275
rafa,127,90,21,2001,1038,209
nole,154,86,20,2003,989,199


### Recap

In [52]:
res = requests.get(url='https://raw.githubusercontent.com/jsulopz/data/main/best_tennis_players_stats.json')

In [53]:
res.content

b'{"income":{"roger":130,"rafa":127,"nole":154},"titles":{"roger":103,"rafa":90,"nole":86},"grand slams":{"roger":20,"rafa":21,"nole":20},"turned professional":{"roger":1998,"rafa":2001,"nole":2003},"wins":{"roger":1251,"rafa":1038,"nole":989},"losses":{"roger":275,"rafa":209,"nole":199}}'

In [54]:
pd.DataFrame(res.content)

ValueError: DataFrame constructor not properly called!

In [55]:
res.json()

{'income': {'roger': 130, 'rafa': 127, 'nole': 154},
 'titles': {'roger': 103, 'rafa': 90, 'nole': 86},
 'grand slams': {'roger': 20, 'rafa': 21, 'nole': 20},
 'turned professional': {'roger': 1998, 'rafa': 2001, 'nole': 2003},
 'wins': {'roger': 1251, 'rafa': 1038, 'nole': 989},
 'losses': {'roger': 275, 'rafa': 209, 'nole': 199}}

In [56]:
pd.DataFrame(res.json())

Unnamed: 0,income,titles,grand slams,turned professional,wins,losses
roger,130,103,20,1998,1251,275
rafa,127,90,21,2001,1038,209
nole,154,86,20,2003,989,199


### Shouldn't it be easier?

> - Apply the discipline to find `function()` within some library

In [57]:
pd.read_json('https://raw.githubusercontent.com/jsulopz/data/main/best_tennis_players_stats.json')

Unnamed: 0,income,titles,grand slams,turned professional,wins,losses
roger,130,103,20,1998,1251,275
rafa,127,90,21,2001,1038,209
nole,154,86,20,2003,989,199


In [58]:
df = pd.read_json('https://raw.githubusercontent.com/jsulopz/data/main/best_tennis_players_stats.json')

In [59]:
df

Unnamed: 0,income,titles,grand slams,turned professional,wins,losses
roger,130,103,20,1998,1251,275
rafa,127,90,21,2001,1038,209
nole,154,86,20,2003,989,199


> - And now calculate the `sum()` of the `income`

In [60]:
df.income.sum()

411