<style> .test { margin-left: 30px;}</style>
<img src="assets/seems_legit.jpg" alt="Smiley face" width="40%" align="left" />


# Guide pour se planter à tous les coups en sélectionnant des données 


Ressources : 

https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html 

https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-part-4-c4216f84d388

https://www.youtube.com/watch?v=4R4WsDJ-KVc

https://nedbatchelder.com/text/names1.html 

https://tomaugspurger.github.io/modern-7-timeseries

In [36]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Reference au même objet en mémoire

### Cas mutable

In [37]:
a1 = [0, 1, 2, 3]
b1 = a1
b1[0] = 99

print('b1 : ', b1)
print('a1 : ', a1)

b1 :  [99, 1, 2, 3]
a1 :  [99, 1, 2, 3]


In [38]:
# On parle à la même personne
a1 is b1

True

In [39]:
a1 = [0, 1, 2, 3]
b1 = a1
b1 = [0, 1, 2, 3]

a1 is b1

False

In [40]:
a1 = [0, 1, 2, 3]
b1 = a1[:]

a1 is b1

False

## Cas immutable

integer type is immutable
Si on change la valeur : on change d'espace mémoire ! 

In [41]:
a = 0
b = a
a is b

True

In [42]:
a = 0
b = a
a = 2 

a is b

False

## Chained indexing with lists

### Problème 

In [43]:
a = [1, 5, 10, 3, 99, 5, 8]

a[2:5][0]

10

In [44]:
a[2:5][0] = 50

a

[1, 5, 10, 3, 99, 5, 8]

In [45]:
a_copie = a[2:5]
a_copie[0] = 0

print('a_copie : ', a_copie)
print('a : ', a)

a_copie :  [0, 3, 99]
a :  [1, 5, 10, 3, 99, 5, 8]


### Explication 

On aurait dû le savoir, car si on va voir la doc officielle, voici ce que l'on trouve : 

<img src="assets/slice_list_doc.png" alt="Smiley face" width="80%">

Capture d'écran issue de : https://docs.python.org/3/tutorial/introduction.html#lists 

##  Quelques trucs bizarres avec Pandas : SettingWithCopyWarning

In [2]:
import pandas as pd
df = pd.read_csv('data/trees.csv')

### Le cas OK

In [47]:
df.ANNEEDEPLANTATION.isnull().sum()

442

In [5]:
print(df.

None


In [48]:
df[df['ANNEEDEPLANTATION'].isnull()].ANNEEDEPLANTATION = 0

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


In [49]:
df.ANNEEDEPLANTATION.isnull().sum()

442

La solution proposée

In [50]:
df.loc[ df['ANNEEDEPLANTATION'].isnull(), 'ANNEEDEPLANTATION'] = 0

df.ANNEEDEPLANTATION.isnull().sum()

0

### Le cas sioux 

In [51]:
raw_data = pd.read_csv('../data/trees.csv')

# Les données qui m'intéressent moi 
data_of_interest = raw_data[['GENRE_BOTA', 'ANNEEDEPLANTATION']]

In [52]:
data_of_interest.loc[ data_of_interest['GENRE_BOTA'].isnull(), 'GENRE_BOTA'] = 'indefini'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


In [53]:
data_of_interest

Unnamed: 0,GENRE_BOTA,ANNEEDEPLANTATION
0,indefini,
1,indefini,
2,indefini,
3,indefini,
4,indefini,
...,...,...
31238,Fraxinus,2001.0
31239,Fraxinus,2001.0
31240,Fraxinus,2001.0
31241,Fraxinus,2001.0


# Test

In [76]:
raw_data = pd.read_csv('../data/trees.csv')

test = raw_data.loc[:, ['DIAMETREARBRE', 'ANNEEABATTAGE']]

In [78]:
test['ANNEEABATTAGE'] = 0
raw_data.update(test)

In [79]:
raw_data

Unnamed: 0,ELEM_POINT_ID,CODE,NOM,GENRE,GENRE_DESC,CATEGORIE,CATEGORIE_DESC,SOUS_CATEGORIE,SOUS_CATEGORIE_DESC,CODE_PARENT,...,COURRIER,IDENTIFIANTPLU,TYPEIMPLANTATIONPLU,INTITULEPROTECTIONPLU,ANNEEABATTAGE,ESSOUCHEMENT,DIAMETREARBRE,CAUSEABATTAGE,COLLECTIVITE,GeoJSON
0,37993,ESP37969,ESP37969,VEG,VEGETATION,ESP01,Arbre,ESP065,Arbre d'enceintes fermées,ESP37898,...,,,,,0.0,,,,,"{""type"":""Point"",""coordinates"":[5.7603469008942..."
1,37992,ESP37968,ESP37968,VEG,VEGETATION,ESP01,Arbre,ESP065,Arbre d'enceintes fermées,ESP37898,...,,,,,0.0,,,,,"{""type"":""Point"",""coordinates"":[5.7598264646441..."
2,37991,ESP37967,ESP37967,VEG,VEGETATION,ESP01,Arbre,ESP065,Arbre d'enceintes fermées,ESP37898,...,,,,,0.0,,,,,"{""type"":""Point"",""coordinates"":[5.7599807314486..."
3,37990,ESP37966,ESP37966,VEG,VEGETATION,ESP01,Arbre,ESP065,Arbre d'enceintes fermées,ESP37898,...,,,,,0.0,,,,,"{""type"":""Point"",""coordinates"":[5.7600570301267..."
4,37989,ESP37965,ESP37965,VEG,VEGETATION,ESP01,Arbre,ESP065,Arbre d'enceintes fermées,ESP37898,...,,,,,0.0,,,,,"{""type"":""Point"",""coordinates"":[5.7600202792924..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
31238,7598,ESP2513,ESP2513,VEG,VEGETATION,ESP01,Arbre,ESP151,Arbre de voirie,ESP1277,...,,,,,0.0,,,,Grenoble Alpes Métropole,"{""type"":""Point"",""coordinates"":[5.7117169490564..."
31239,7596,ESP2512,ESP2512,VEG,VEGETATION,ESP01,Arbre,ESP151,Arbre de voirie,ESP1277,...,,,,,0.0,,,,Grenoble Alpes Métropole,"{""type"":""Point"",""coordinates"":[5.7115904446110..."
31240,7594,ESP2511,ESP2511,VEG,VEGETATION,ESP01,Arbre,ESP151,Arbre de voirie,ESP1277,...,,,,,0.0,,,,Grenoble Alpes Métropole,"{""type"":""Point"",""coordinates"":[5.7114873970721..."
31241,19,ESP1798,ESP1798,VEG,VEGETATION,ESP01,Arbre,ESP151,Arbre de voirie,ESP1277,...,,,,,0.0,,,,Grenoble Alpes Métropole,"{""type"":""Point"",""coordinates"":[5.7117859768817..."
