_See [Readme](https://github.com/fleuryc/oc_ingenieur-ia_P2-Participez-a-un-concours-sur-la-Smart-City#readme) for installation instructions_

---


# Concours Data is for Good : aidons Paris à devenir une smart-city !

## Contexte

Dans le cadre du programme "Végétalisons la ville" organisé par la ville de Paris, nous proposons ici une analyse exploratoire des données OpenData concernant les arbres gérés par la ville de Paris.

L'objectif est d'aider Paris à devenir une "Smart-City" en gérant ses arbres de la manière la plus responsable possible. C'est-à-dire en optimisant les trajets nécessaires pour entretenir ces arbres.


## Outils utilisés

Nous allons utiliser le langage Python, et présenter ici le code, les résultats et l'analyse sous forme de [Notebook Jupyter](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html).

Nous allons aussi utiliser les bibliothèques usuelles d'exploration et analyse de données, afin d'améliorer la simplicité et la performance de notre code :
* [NumPy](https://numpy.org/doc/stable/user/quickstart.html) et [Pandas](https://pandas.pydata.org/docs/user_guide/index.html) : effectuer des calculs scientifiques (statistiques, algèbre, ...) et manipuler des séries et tableaux de données volumineuses et complexes
* [Matplotlib](https://matplotlib.org/stable/tutorials/introductory/usage.html), [Pyplot](https://matplotlib.org/stable/tutorials/introductory/pyplot.html), [Seaborn](https://seaborn.pydata.org/tutorial/function_overview.html) et [Plotly](https://plotly.com/python/getting-started/) : générer des graphiques lisibles, intéractifs et pertinents


In [14]:
# Import libraries
import os.path
from zipfile import ZipFile as zf
import requests

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

## If you use Notebook (and not JupyterLab), uncomment following lines
# import plotly.io as pio
# pio.renderers.default='notebook'


## Chargement des données et premier aperçu

Les données mises à disposition sont issues de  [opendata.paris.fr](https://opendata.paris.fr/explore/dataset/les-arbres/information/) et représentent "l’ensemble des arbres, ainsi que les arbres d’alignement, présents sur le territoire parisien et des cimetières extra-muros (hors de Paris)."



Nous allons dans un premier temps simplement charger les données en mémoire et observer quelques valeurs.

In [15]:

data_local_path = 'data/'
csv_filename = 'fr.openfoodfacts.org.products.csv'
csv_local_path = data_local_path+csv_filename

if not os.path.isfile(csv_local_path):
    zip_filename = csv_filename+'.zip'
    zip_url = 'https://s3-eu-west-1.amazonaws.com/static.oc-static.com/prod/courses/files/parcours-data-scientist/P2/'+zip_filename
    zip_local_path = data_local_path+zip_filename

    r = requests.get(zip_url)
    with open(zip_local_path, 'wb') as f:
        f.write(r.content)

    with zf(zip_local_path, 'r') as zip_file:
        zip_file.extractall(data_local_path)

raw_data = pd.read_csv(csv_local_path, sep='\t')

# Display data types and empty values
raw_data.info()


  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 320772 entries, 0 to 320771
Columns: 162 entries, code to water-hardness_100g
dtypes: float64(106), object(56)
memory usage: 396.5+ MB


In [55]:
num_rows = len(raw_data.index)
columns_emptiness = pd.DataFrame({
    col : { 
        'count': raw_data[col].isna().sum(),
        'percent': raw_data[col].isna().sum() / num_rows * 100,
    } for col in raw_data.columns
}).transpose().sort_values(by=['count'])

fig = px.bar(columns_emptiness,
    color='percent',
    y='percent',
    labels={
        'index':'column name',
        'percent':'% of empty values',
        'count':'# of empty values',
    },
    hover_data=['count'],
    title='Empty values per column',
    width=1200,
    height=800,
)
fig.show()


In [69]:
clean_data = raw_data.dropna(
        axis='columns',
        thresh=.1 * num_rows
    ).drop_duplicates()

# Display data types and empty values
clean_data.info()

clean_data = clean_data[['product_name',
    'packaging_tags', 'brands_tags', 'manufacturing_places_tags', 'countries_tags',
    'main_category', 'categories_tags', 
    'labels_tags', 'additives_n', 'additives_tags', 
    'nutrition_grade_fr', 'nutrition-score-fr_100g',
    'energy_100g', 'saturated-fat_100g', 'sugars_100g', 'salt_100g', 'sodium_100g',
    'fruits-vegetables-nuts_100g', 'fiber_100g', 'proteins_100g',
]]

# Display statistical summary of each column
clean_data.describe(include="all")


<class 'pandas.core.frame.DataFrame'>
Int64Index: 320772 entries, 0 to 320771
Data columns (total 62 columns):
 #   Column                                   Non-Null Count   Dtype  
---  ------                                   --------------   -----  
 0   code                                     320749 non-null  object 
 1   url                                      320749 non-null  object 
 2   creator                                  320770 non-null  object 
 3   created_t                                320769 non-null  object 
 4   created_datetime                         320763 non-null  object 
 5   last_modified_t                          320772 non-null  object 
 6   last_modified_datetime                   320772 non-null  object 
 7   product_name                             303010 non-null  object 
 8   generic_name                             52795 non-null   object 
 9   quantity                                 104819 non-null  object 
 10  packaging                       

Unnamed: 0,code,url,creator,created_t,created_datetime,last_modified_t,last_modified_datetime,product_name,generic_name,quantity,...,fiber_100g,proteins_100g,salt_100g,sodium_100g,vitamin-a_100g,vitamin-c_100g,calcium_100g,iron_100g,nutrition-score-fr_100g,nutrition-score-uk_100g
count,320749.0,320749,320770,320769.0,320763,320772.0,320772,303010,52795,104819,...,200886.0,259922.0,255510.0,255463.0,137554.0,140867.0,141050.0,140462.0,221210.0,221210.0
unique,320638.0,320749,3535,189636.0,189568,180639.0,180495,221347,38584,13826,...,,,,,,,,,,
top,11110820000.0,http://world-fr.openfoodfacts.org/produit/0036...,usda-ndb-import,1489056000.0,2017-03-09T10:37:09Z,1439142000.0,2015-08-09T17:35:42Z,Ice Cream,Pâtes alimentaires au blé dur de qualité supér...,500 g,...,,,,,,,,,,
freq,2.0,1,169868,20.0,20,33.0,33,410,201,4669,...,,,,,,,,,,
mean,,,,,,,,,,,...,2.862111,7.07594,2.028624,0.798815,0.000397,0.023367,0.125163,0.003652,9.165535,9.058049
std,,,,,,,,,,,...,12.867578,8.409054,128.269454,50.504428,0.073278,2.236451,3.318263,0.214408,9.055903,9.183589
min,,,,,,,,,,,...,-6.7,-800.0,0.0,0.0,-0.00034,-0.0021,0.0,-0.00026,-15.0,-15.0
25%,,,,,,,,,,,...,0.0,0.7,0.0635,0.025,0.0,0.0,0.0,0.0,1.0,1.0
50%,,,,,,,,,,,...,1.5,4.76,0.58166,0.229,0.0,0.0,0.035,0.00101,10.0,9.0
75%,,,,,,,,,,,...,3.6,10.0,1.37414,0.541,0.000107,0.0037,0.106,0.0024,16.0,16.0


In [21]:

# display value frequencies per column
for col in raw_data.columns:
    print(f'\n \
================================================\n \
>    { col }\n \
------------------------------------------------')
    display(raw_data[col].value_counts(dropna=False))



 >    code
 ------------------------------------------------


NaN              23
635646            2
37600105033       2
31200029997       2
16300166360       2
                 ..
3023290102633     1
2441879031357     1
72734127155       1
858246001714      1
3760089530627     1
Name: code, Length: 320639, dtype: int64


 >    url
 ------------------------------------------------


NaN                                                                                                                                             23
http://world-fr.openfoodfacts.org/produit/0070038634102/superior-selection-sparkling-soda-sicilian-lemonade-associated-wholesale-grocers-inc     1
http://world-fr.openfoodfacts.org/produit/0718122066070/organic-ice-cream-three-twins-ice-cream                                                  1
http://world-fr.openfoodfacts.org/produit/0041800354504/100-grape-juice-grape-welch-s                                                            1
http://world-fr.openfoodfacts.org/produit/5601151989814                                                                                          1
                                                                                                                                                ..
http://world-fr.openfoodfacts.org/produit/3256222254272/fromage-frais-nature-u-bio                                    


 >    creator
 ------------------------------------------------


usda-ndb-import               169868
openfoodfacts-contributors     40117
kiliweb                        13891
date-limite-app                11918
openfood-ch-import             11478
                               ...  
jipi                               1
babadouk                           1
sturm                              1
bigdog1948                         1
zoukanss                           1
Name: creator, Length: 3536, dtype: int64


 >    created_t
 ------------------------------------------------


1489055829    20
1489077120    20
1489050353    19
1489077322    18
1489077002    17
              ..
1476556325     1
1434131137     1
1463913106     1
1428523667     1
1465647103     1
Name: created_t, Length: 189637, dtype: int64


 >    created_datetime
 ------------------------------------------------


2017-03-09T10:37:09Z    20
2017-03-09T16:32:00Z    20
2017-03-09T09:05:53Z    19
2017-03-09T16:35:22Z    18
2017-03-09T16:30:52Z    17
                        ..
2015-07-14T16:35:27Z     1
2017-03-18T12:26:08Z     1
2012-11-20T18:07:01Z     1
2016-12-06T16:17:58Z     1
2016-03-25T11:52:29Z     1
Name: created_datetime, Length: 189569, dtype: int64


 >    last_modified_t
 ------------------------------------------------


1439141742    33
1439141756    30
1439141747    29
1439141745    28
1439141730    28
              ..
1480167366     1
1487769543     1
1474138056     1
1464944525     1
1466695679     1
Name: last_modified_t, Length: 180639, dtype: int64


 >    last_modified_datetime
 ------------------------------------------------


2015-08-09T17:35:42Z    33
2015-08-09T17:35:56Z    30
2015-08-09T17:35:47Z    29
2015-08-09T17:35:35Z    28
2015-08-09T17:35:45Z    28
                        ..
2016-10-02T10:41:21Z     1
2016-04-24T12:31:55Z     1
2016-10-31T11:54:18Z     1
2015-02-01T20:01:30Z     1
2016-10-30T13:08:00Z     1
Name: last_modified_datetime, Length: 180495, dtype: int64


 >    product_name
 ------------------------------------------------


NaN                                                   17762
Ice Cream                                               410
Extra Virgin Olive Oil                                  303
Potato Chips                                            281
Premium Ice Cream                                       226
                                                      ...  
Maple Brown Sugar With Almonds Multi Grain Oatmeal        1
Couscous semi-complet - grains moyens bio                 1
Gazpacho, Mediterranean Vegetable Medley                  1
Les Entremets, riz au lait caramel                        1
Soupe Phnom Penh Nam Vang Oh Ricey                        1
Name: product_name, Length: 221348, dtype: int64


 >    generic_name
 ------------------------------------------------


NaN                                                                           267977
Pâtes alimentaires au blé dur de qualité supérieure                              201
Aliment pour bébés                                                                92
Pâtes alimentaires de qualité supérieure                                          82
Jambon cuit supérieur                                                             80
                                                                               ...  
Filets de poulet et de dinde traités en salaison, féculés, cuits standards         1
Céréales grillées au miel                                                          1
Saucisson cuit à l'ail qualité choix                                               1
Assaisonnement pour soupe                                                          1
Mélange d'aromates pour crème fraîche                                              1
Name: generic_name, Length: 38585, dtype: int64


 >    quantity
 ------------------------------------------------


NaN                                   215953
500 g                                   4669
200 g                                   4063
250 g                                   3883
100 g                                   3043
                                       ...  
8 capsules                                 1
4.23 oz                                    1
6 x 4 biscuits (300 g)                     1
825 g ou 850 ml, net égoutté 455 g         1
300 g (24 tranches)                        1
Name: quantity, Length: 13827, dtype: int64


 >    packaging
 ------------------------------------------------


NaN                                                                241812
Carton                                                               2153
Sachet,Plastique                                                     2141
Plastique                                                            1902
Bouteille,Verre                                                      1342
                                                                    ...  
box,Box                                                                 1
Frais,Sachet plastique,Plastique (PE)                                   1
Tetra Brik Aseptic,84 C/PAP,196×92×59,перфорация для открывания         1
Boite,Carton,Frais,Papier                                               1
carton,bouteille verre,capsule,Bouteille:verre brune                    1
Name: packaging, Length: 14548, dtype: int64


 >    packaging_tags
 ------------------------------------------------


NaN                                                            241811
sachet,plastique                                                 3959
carton                                                           2927
plastique                                                        2793
barquette,plastique                                              2136
                                                                ...  
surgele,plastique,film                                              1
frais,barquette,couvercle,plastique,barquette                       1
pot,verre,pot-en-verre,couvercle-plastique,point-vert               1
bouteille,verre,bouteille-en-verre-transparante-avec-relief         1
plastico,vertisac                                                   1
Name: packaging_tags, Length: 12065, dtype: int64


 >    brands
 ------------------------------------------------


NaN               28412
Carrefour          2978
Auchan             2340
U                  2050
Meijer             1995
                  ...  
Paradis glaces        1
SOIGNON               1
El Jaliciense         1
Chef's Select         1
martine mahé          1
Name: brands, Length: 58785, dtype: int64


 >    brands_tags
 ------------------------------------------------


NaN                       28420
carrefour                  3149
auchan                     2468
u                          2082
meijer                     1996
                          ...  
weetabix-limited              1
sultanines                    1
millenaire                    1
vignerons-de-caractere        1
grossglockner-inc             1
Name: brands_tags, Length: 50254, dtype: int64


 >    categories
 ------------------------------------------------


NaN                                                                                                                                                                            236362
Snacks sucrés,Biscuits et gâteaux,Biscuits                                                                                                                                        301
Biscuits                                                                                                                                                                          287
Snacks sucrés,Biscuits et gâteaux,Biscuits,Biscuits au chocolat                                                                                                                   247
Aliments et boissons à base de végétaux,Aliments d'origine végétale,Petit-déjeuners,Céréales et pommes de terre,Céréales et dérivés,Céréales pour petit-déjeuner                  222
                                                                                          


 >    categories_tags
 ------------------------------------------------


NaN                                                                                                                                                                                                                                                                                                                                  236383
en:sugary-snacks,en:biscuits-and-cakes,en:biscuits                                                                                                                                                                                                                                                                                      802
en:sugary-snacks,en:chocolates,en:dark-chocolates                                                                                                                                                                                                                                                                                       609
en:s


 >    categories_fr
 ------------------------------------------------


NaN                                                                                                                                                                 236361
Snacks sucrés,Biscuits et gâteaux,Biscuits                                                                                                                             802
Snacks sucrés,Chocolats,Chocolats noirs                                                                                                                                609
Snacks sucrés,Confiseries,Bonbons                                                                                                                                      526
Aliments et boissons à base de végétaux,Aliments d'origine végétale,Petit-déjeuners,Céréales et pommes de terre,Céréales et dérivés,Céréales pour petit-déjeuner       522
                                                                                                                                                 


 >    origins
 ------------------------------------------------


NaN                                                        298582
France                                                       5171
España                                                        569
Italie                                                        473
Australia                                                     434
                                                            ...  
Afrique de l'Ouest                                              1
Italie,Pinerolo                                                 1
Deutschland                                                     1
Navarra,Espagne                                                 1
Conil de la Frontera,Cádiz (provincia),Andalucía,España         1
Name: origins, Length: 4841, dtype: int64


 >    origins_tags
 ------------------------------------------------


NaN                                               298619
france                                              5303
union-europeenne                                     625
espana                                               589
italie                                               506
                                                   ...  
mozzarella,emmental,france                             1
mondelez-france-sas-bp-100-92146-clamart-cedex         1
agricultura-no-ue,cacao,kyela,tanzania                 1
fromagerie-de-grieges                                  1
bassin-de-l-adour,aop                                  1
Name: origins_tags, Length: 4373, dtype: int64


 >    manufacturing_places
 ------------------------------------------------


NaN                                                                                                                                                                                         284271
France                                                                                                                                                                                        9371
Italie                                                                                                                                                                                        1251
Deutschland                                                                                                                                                                                    776
Belgique                                                                                                                                                                                       744
                         


 >    manufacturing_places_tags
 ------------------------------------------------


NaN                                                                       284277
france                                                                      9451
italie                                                                      1278
deutschland                                                                  778
belgique                                                                     754
                                                                           ...  
bunheiro-freguesia,murtosa,aveiro-distrito,region-centro,portugal              1
elche-alicante                                                                 1
societe-industrielle-de-bondues,bondues,nord,nord-pas-de-calais,france         1
330c-chemin-des-scieries,38840,saint-hilaire-du-rosier,france                  1
santa-cruz-pedania,murcia,murcia-comunidad-autonoma,espana                     1
Name: manufacturing_places_tags, Length: 6737, dtype: int64


 >    labels
 ------------------------------------------------


NaN                                                                                                                                           274213
Organic, EU Organic, fr:AB Agriculture Biologique                                                                                               3223
Point Vert                                                                                                                                      2053
Vegetariano,Vegano                                                                                                                              1048
Bio,Bio européen,AB Agriculture Biologique                                                                                                       979
                                                                                                                                               ...  
No soy,Non GMO Project Verified                                                                           


 >    labels_tags
 ------------------------------------------------


NaN                                                                                                                                                                                            274128
en:organic,en:eu-organic,fr:ab-agriculture-biologique                                                                                                                                            5311
en:green-dot                                                                                                                                                                                     2456
en:vegetarian,en:vegan                                                                                                                                                                           1654
en:green-dot,fr:eco-emballages                                                                                                                                                                    835
          


 >    labels_fr
 ------------------------------------------------


NaN                                                                                          274106
Bio,Bio européen,AB Agriculture Biologique                                                     5311
Point Vert                                                                                     2456
Végétarien,Végétalien                                                                          1654
Point Vert,Eco-emballages                                                                       835
                                                                                              ...  
FSC,Fruits-presses,Tetra-pak                                                                      1
Labels de distributeurs,Qualité Carrefour,Conditionne-en-france                                   1
en:Pure-australian                                                                                1
Sans exhausteur de goût,Sans huile de palme,Escalope-100-filet                                    1



 >    emb_codes
 ------------------------------------------------


NaN                    291466
EMB 56251E                218
FR 85.154.002 EC          128
EMB 49331H                105
FR 72.264.002 EC          103
                        ...  
EMB 21423D,21 E 549         1
DE EK 183 EC                1
FR 69.078.050 EC            1
FR 56.012.001 EC            1
NM 469                      1
Name: emb_codes, Length: 8463, dtype: int64


 >    emb_codes_tags
 ------------------------------------------------


NaN                            291469
emb-56251e                        218
fr-85-154-002-ec                  128
emb-49331h                        105
fr-72-264-002-ec                  103
                                ...  
fr-52-465-001-ec                    1
fr-94-022-088-ec                    1
emb-72373a,fr-72-373-002-ec         1
fr-79-049-002                       1
de-nw-60027-ec                      1
Name: emb_codes_tags, Length: 8159, dtype: int64


 >    first_packaging_code_geo
 ------------------------------------------------


NaN                    301969
47.633333,-2.666667       279
47.833333,-0.333333       245
49.266667,-0.666667       197
48.1,-4.333333            169
                        ...  
43.666667,4.633333          1
46.583333,-1.516667         1
45.8,0.616667               1
44.816667,0.45              1
48.466667,-3.516667         1
Name: first_packaging_code_geo, Length: 1603, dtype: int64


 >    cities
 ------------------------------------------------


NaN    320749
c           9
a           8
b           6
Name: cities, dtype: int64


 >    cities_tags
 ------------------------------------------------


NaN                                         300452
theix-morbihan-france                          270
douarnenez-finistere-france                    154
sable-sur-sarthe-sarthe-france                 146
saint-martin-des-entrees-calvados-france       136
                                             ...  
monein-pyrenees-atlantiques-france               1
maizey-meuse-france                              1
lambersart-nord-france                           1
morne-a-l-eau-guadeloupe-france                  1
oraison-alpes-de-haute-provence-france           1
Name: cities_tags, Length: 2573, dtype: int64


 >    purchase_places
 ------------------------------------------------


NaN                                  262579
France                                11762
Lyon,France                            3101
Courrières,France                      2279
Madrid,España                          2000
                                      ...  
Aubervilliers - France                    1
Caluire,Rhône,Paris,France                1
Brétigny-sur-Orge,France,Rennes           1
Le Palais,Belle-Île-en-Mer,France         1
Belgique Waterloo                         1
Name: purchase_places, Length: 5121, dtype: int64


 >    stores
 ------------------------------------------------


NaN                                           269050
Carrefour                                       6465
Auchan                                          2869
Leclerc                                         2826
Cora                                            2274
                                               ...  
Woolworths Town Hall,Woolworths,Coles,Bilo         1
intermarché,Carrefour,Leclerc                      1
Auchan,Simply market                               1
Costcutter                                         1
Lidl de Xabregas                                   1
Name: stores, Length: 3260, dtype: int64


 >    countries
 ------------------------------------------------


US                                         169928
France                                      77292
en:FR                                       16979
Suisse                                      12314
Deutschland                                  6161
                                            ...  
Česko,Itálie                                    1
Inde                                            1
en:EU, US                                       1
Belgium,France,Italy,United Kingdom             1
France,Italy,Netherlands,United Kingdom         1
Name: countries, Length: 1435, dtype: int64


 >    countries_tags
 ------------------------------------------------


en:united-states                                                     172998
en:france                                                             94391
en:switzerland                                                        14953
en:germany                                                             7870
en:spain                                                               5009
                                                                      ...  
en:austria,en:germany,en:spain                                            1
en:france,fr:cora                                                         1
en:australia,en:france,en:united-kingdom,en:scotland                      1
en:belgium,en:france,en:germany,en:greece,en:netherlands,en:spain         1
en:denemarken,en:finand,en:frankrijk,en:italiaans,en:zwitserland          1
Name: countries_tags, Length: 726, dtype: int64


 >    countries_fr
 ------------------------------------------------


États-Unis                                    172998
France                                         94392
Suisse                                         14953
Allemagne                                       7870
Espagne                                         5009
                                               ...  
Finlande,France,Grèce,Italie,Espagne,Suède         1
France,Russie,Espagne,Royaume-Uni                  1
États-Unis,en:Niederlande                          1
Belgique,France,Martinique,Suisse                  1
Italie,Lituanie                                    1
Name: countries_fr, Length: 723, dtype: int64


 >    ingredients_text
 ------------------------------------------------


NaN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     


 >    allergens
 ------------------------------------------------


NaN                                                                   292428
blé                                                                     1279
Lait                                                                     829
lait                                                                     702
soja                                                                     513
                                                                       ...  
parmesan, lait                                                             1
wheat, barley malt                                                         1
blé, lait, lactosérum, beurre, orge, soja, beurre, disulfite, lait         1
blé, crème, crème, blé, blé, œufs                                          1
blé, œufs, œufs, œufs, amandes, soja                                       1
Name: allergens, Length: 12940, dtype: int64


 >    allergens_fr
 ------------------------------------------------


NaN                                                                              320753
http://fr.openfoodfacts.org/images/products/303/349/114/3014/front.17.200.jpg         1
http://fr.openfoodfacts.org/images/products/303/349/043/3864/front.3.200.jpg          1
http://fr.openfoodfacts.org/images/products/303/349/095/8008/front.8.200.jpg          1
http://fr.openfoodfacts.org/images/products/303/349/059/4510/front.20.200.jpg         1
http://fr.openfoodfacts.org/images/products/303/349/085/3587/front.9.200.jpg          1
http://fr.openfoodfacts.org/images/products/303/349/512/5016/front.8.200.jpg          1
http://fr.openfoodfacts.org/images/products/303/349/085/3570/front.6.200.jpg          1
http://fr.openfoodfacts.org/images/products/303/349/091/7418/front.10.200.jpg         1
http://fr.openfoodfacts.org/images/products/303/349/084/1782/front.9.200.jpg          1
http://fr.openfoodfacts.org/images/products/303/349/085/3884/front.11.200.jpg         1
http://fr.openfoodfacts.org/imag


 >    traces
 ------------------------------------------------


NaN                                                            296419
Fruits à coque                                                   1240
Lait                                                              557
Gluten                                                            514
Œufs                                                              416
                                                                ...  
Sesame seeds,Milk                                                   1
gluten,oeuf,soja,céleri,crustacé,poisson,moutarde,mollusque         1
soja,amandes,noisettes,lait,blé,oeuf                                1
Haselnüsse,Mandeln,Erdnüsse                                         1
arachide,fruits à coque,lait,soja,graines de sésame                 1
Name: traces, Length: 8379, dtype: int64


 >    traces_tags
 ------------------------------------------------


NaN                                                                                                                              296443
en:nuts                                                                                                                            2051
en:milk                                                                                                                             844
en:eggs                                                                                                                             788
en:gluten                                                                                                                           718
                                                                                                                                  ...  
en:eggs,en:lupin,de:nusse                                                                                                             1
fr:peuvent-contenir-des-noyaux                  


 >    traces_fr
 ------------------------------------------------


NaN                                                                     296420
Fruits à coque                                                            2051
Lait                                                                       844
Œufs                                                                       788
Gluten                                                                     718
                                                                         ...  
Fruits à coque,en:Cashew-nuts                                                1
Céleri,Œufs,Lupin,Lait,Mollusques,Moutarde,Graines de sésame,Soja            1
Poisson,Fruits à coque,Anhydride sulfureux et sulfites                       1
Céleri,Œufs,Lait,Moutarde,Arachides,Graines de sésame                        1
Gluten,Moutarde,Fruits à coque,Graines de sésame,Soja,Lait-de-chevre         1
Name: traces_fr, Length: 3585, dtype: int64


 >    serving_size
 ------------------------------------------------


NaN                                        109441
240 ml (8 fl oz)                             5496
28 g (1 oz)                                  5374
28 g (1 ONZ)                                 3770
15 ml (1 Tbsp)                               2959
                                            ...  
33cl (1bouteille)                               1
28.75 g (1 MADELEINE)                           1
93 g (1 CONE)                                   1
9 g (1.6 Tbsp)                                  1
75 g (0.75 CUP PREPARED (200G) | ABOUT)         1
Name: serving_size, Length: 25424, dtype: int64


 >    no_nutriments
 ------------------------------------------------


NaN    320772
Name: no_nutriments, dtype: int64


 >    additives_n
 ------------------------------------------------


0.0     94259
NaN     71833
1.0     46509
2.0     36520
3.0     23680
4.0     15243
5.0     10935
6.0      7290
7.0      4702
8.0      3359
9.0      2194
10.0     1336
11.0      893
12.0      589
13.0      376
14.0      325
15.0      224
16.0      128
17.0      109
18.0       68
19.0       55
20.0       48
22.0       27
21.0       21
23.0       15
25.0       11
24.0       10
31.0        4
26.0        3
28.0        2
29.0        2
27.0        2
Name: additives_n, dtype: int64


 >    additives
 ------------------------------------------------


NaN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     


 >    additives_tags
 ------------------------------------------------


NaN                                                                166092
en:e322                                                              8264
en:e330                                                              7709
en:e375,en:e101                                                      7624
en:e300                                                              3024
                                                                    ...  
en:e330,en:e331,en:e444,en:e950,en:e385                                 1
en:e471,en:e450,en:e320,en:e223,en:e451,en:e310,en:e330,en:e621         1
en:e160,en:e300                                                         1
en:e330,en:e331,en:e340i,en:e211,en:e955,en:e110,en:e950                1
en:e262,en:e326,en:e300,en:e301                                         1
Name: additives_tags, Length: 41538, dtype: int64


 >    additives_fr
 ------------------------------------------------


NaN                                                                                                                                                                                 166092
E322 - Lécithines                                                                                                                                                                     8264
E330 - Acide citrique                                                                                                                                                                 7709
E375 - Acide nicotinique,E101 - Riboflavine                                                                                                                                           7624
E300 - Acide ascorbique                                                                                                                                                               3024
                                                                 


 >    ingredients_from_palm_oil_n
 ------------------------------------------------


0.0    244104
NaN     71833
1.0      4776
2.0        59
Name: ingredients_from_palm_oil_n, dtype: int64


 >    ingredients_from_palm_oil
 ------------------------------------------------


NaN    320772
Name: ingredients_from_palm_oil, dtype: int64


 >    ingredients_from_palm_oil_tags
 ------------------------------------------------


NaN                                                           315937
huile-de-palme                                                  4586
e304-palmitate-d-ascorbyle                                       158
huile-de-palme,e304-palmitate-d-ascorbyle                         35
oleine-de-palme                                                   19
mono-et-diglycerides-d-acides-gras-de-palme                       12
e304-palmitate-d-ascorbyle,huile-de-palme                          6
huile-de-palme,oleine-de-palme                                     6
huile-de-palme,stearine-de-palme                                   5
oleine-de-palme,e304-palmitate-d-ascorbyle                         2
huile-de-palme,mono-et-diglycerides-d-acides-gras-de-palme         2
oleine-de-palme,huile-de-palme                                     1
stearine-de-palme                                                  1
stearine-de-palme,huile-de-palme                                   1
mono-et-diglycerides-d-acides-gras


 >    ingredients_that_may_be_from_palm_oil_n
 ------------------------------------------------


0.0    237243
NaN     71833
1.0     10037
2.0      1321
3.0       286
4.0        45
5.0         6
6.0         1
Name: ingredients_that_may_be_from_palm_oil_n, dtype: int64


 >    ingredients_that_may_be_from_palm_oil
 ------------------------------------------------


NaN    320772
Name: ingredients_that_may_be_from_palm_oil, dtype: int64


 >    ingredients_that_may_be_from_palm_oil_tags
 ------------------------------------------------


NaN                                                                                                                                                                                    309076
e160a-beta-carotene                                                                                                                                                                      2843
e471-mono-et-diglycerides-d-acides-gras-alimentaires                                                                                                                                     2579
huile-vegetale                                                                                                                                                                           2082
e433-monooleate-de-polyoxyethylene-de-sorbitane                                                                                                                                          2079
                                                  


 >    nutrition_grade_uk
 ------------------------------------------------


NaN    320772
Name: nutrition_grade_uk, dtype: int64


 >    nutrition_grade_fr
 ------------------------------------------------


NaN    99562
d      62763
c      45538
e      43030
a      35634
b      34245
Name: nutrition_grade_fr, dtype: int64


 >    pnns_groups_1
 ------------------------------------------------


NaN                        229259
unknown                     22624
Sugary snacks               12368
Beverages                    9033
Milk and dairy products      8825
Cereals and potatoes         8442
Fish Meat Eggs               8041
Composite foods              6747
Fruits and vegetables        5908
Fat and sauces               5216
Salty snacks                 2809
fruits-and-vegetables         987
sugary-snacks                 496
cereals-and-potatoes           16
salty-snacks                    1
Name: pnns_groups_1, dtype: int64


 >    pnns_groups_2
 ------------------------------------------------


NaN                                 226281
unknown                              22624
One-dish meals                        5546
Sweets                                4698
Biscuits and cakes                    4561
Non-sugared beverages                 4302
Cereals                               4106
Cheese                                4024
Dressings and sauces                  3602
Milk and yogurt                       3297
Processed meat                        3247
Chocolate products                    3109
Alcoholic beverages                   2909
Vegetables                            2840
Fish and seafood                      2638
Sweetened beverages                   2170
Appetizers                            2101
Fruits                                2068
Fruit juices                          1924
Bread                                 1838
Meat                                  1694
Fats                                  1614
Breakfast cereals                     1408
vegetables 


 >    states
 ------------------------------------------------


en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed, en:packaging-code-to-be-completed, en:characteristics-to-be-completed, en:categories-to-be-completed, en:brands-completed, en:packaging-to-be-completed, en:quantity-to-be-completed, en:product-name-completed, en:photos-to-be-uploaded                               168905
en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:characteristics-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-validated, en:photos-uploaded                                                                       23401
en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed, en:characteristics-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity


 >    states_tags
 ------------------------------------------------


en:to-be-completed,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-to-be-completed,en:packaging-code-to-be-completed,en:characteristics-to-be-completed,en:categories-to-be-completed,en:brands-completed,en:packaging-to-be-completed,en:quantity-to-be-completed,en:product-name-completed,en:photos-to-be-uploaded                              168905
en:to-be-checked,en:complete,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-completed,en:characteristics-completed,en:categories-completed,en:brands-completed,en:packaging-completed,en:quantity-completed,en:product-name-completed,en:photos-validated,en:photos-uploaded                                                                       23401
en:to-be-checked,en:complete,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-to-be-completed,en:characteristics-completed,en:categories-completed,en:brands-completed,en:packaging-completed,en:quantity-completed,en:product-name-comple


 >    states_fr
 ------------------------------------------------


A compléter,Informations nutritionnelles complétées,Ingrédients complétés,Date limite à compléter,en:Packaging-code-to-be-completed,Caractéristiques à compléter,Catégories à compléter,Marques complétées,Emballage à compléter,Quantité à compléter,Nom du produit complete,Photos à envoyer                      168905
A vérifier,Complet,Informations nutritionnelles complétées,Ingrédients complétés,Date limite complétée,Caractéristiques complétées,Catégories complétées,Marques complétées,Emballage complété,Quantité complétée,Nom du produit complete,Photos validées,Photos envoyées                                            23401
A vérifier,Complet,Informations nutritionnelles complétées,Ingrédients complétés,Date limite à compléter,Caractéristiques complétées,Catégories complétées,Marques complétées,Emballage complété,Quantité complétée,Nom du produit complete,Photos validées,Photos envoyées                                          19080
A compléter,Informations nutritionnelles à compléter,In


 >    main_category
 ------------------------------------------------


NaN                                   236406
en:beverages                            6054
en:groceries                            2902
en:chocolates                           2789
en:plant-based-foods-and-beverages      2745
                                       ...  
en:organic-spelt-flour                     1
en:smoked-ham                              1
de:grana-padano                            1
de:speisefarben                            1
it:pandoro                                 1
Name: main_category, Length: 3544, dtype: int64


 >    main_category_fr
 ------------------------------------------------


NaN                                                        236406
Boissons                                                     6054
Epicerie                                                     2902
Chocolats                                                    2789
Aliments et boissons à base de végétaux                      2745
                                                            ...  
Galettes-au-beurre                                              1
Matiere-grasse-vegetale-a-tartiner                              1
en:Juice-beverage                                               1
Beurre-de-cacao-pour-applications-salees-matiere-grasse         1
es:Galleta-salada                                               1
Name: main_category_fr, Length: 3544, dtype: int64


 >    image_url
 ------------------------------------------------


NaN                                                                              244936
http://fr.openfoodfacts.org/images/products/00749145/front.3.400.jpg                  1
http://fr.openfoodfacts.org/images/products/370/027/840/3400/front.3.400.jpg          1
http://fr.openfoodfacts.org/images/products/01134742/front.6.400.jpg                  1
http://fr.openfoodfacts.org/images/products/405/270/024/1685/front.6.400.jpg          1
                                                                                  ...  
http://fr.openfoodfacts.org/images/products/325/039/150/5388/front.15.400.jpg         1
http://fr.openfoodfacts.org/images/products/302/169/002/1936/front.5.400.jpg          1
http://fr.openfoodfacts.org/images/products/335/003/021/5958/front.5.400.jpg          1
http://fr.openfoodfacts.org/images/products/502/104/710/4525/front.3.400.jpg          1
http://fr.openfoodfacts.org/images/products/401/180/057/3218/front.3.400.jpg          1
Name: image_url, Length: 75837, 


 >    image_small_url
 ------------------------------------------------


NaN                                                                              244936
http://fr.openfoodfacts.org/images/products/366/134/405/7159/front.8.200.jpg          1
http://fr.openfoodfacts.org/images/products/302/329/045/3926/front.9.200.jpg          1
http://fr.openfoodfacts.org/images/products/848/000/082/3908/front.3.200.jpg          1
http://fr.openfoodfacts.org/images/products/005/574/250/2770/front.3.200.jpg          1
                                                                                  ...  
http://fr.openfoodfacts.org/images/products/311/032/001/0065/front.8.200.jpg          1
http://fr.openfoodfacts.org/images/products/317/568/102/8111/front.42.200.jpg         1
http://fr.openfoodfacts.org/images/products/841/460/644/5974/front.15.200.jpg         1
http://fr.openfoodfacts.org/images/products/761/009/500/3003/front.3.200.jpg          1
http://fr.openfoodfacts.org/images/products/848/001/748/8961/front.5.200.jpg          1
Name: image_small_url, Length: 7


 >    energy_100g
 ------------------------------------------------


NaN       59659
0.0        8909
2092.0     5075
1674.0     4012
1494.0     3916
          ...  
66.9          1
2819.0        1
1531.9        1
231.4         1
891.3         1
Name: energy_100g, Length: 3998, dtype: int64


 >    energy-from-fat_100g
 ------------------------------------------------


NaN       319915
0.0          164
75.0          27
1200.0        24
1050.0        14
           ...  
235.0          1
3740.0         1
50.4           1
21.0           1
589.0          1
Name: energy-from-fat_100g, Length: 336, dtype: int64


 >    fat_100g
 ------------------------------------------------


NaN      76881
0.00     64504
25.00     3409
0.50      3202
32.14     2981
         ...  
47.02        1
35.38        1
42.76        1
23.09        1
20.19        1
Name: fat_100g, Length: 3379, dtype: int64


 >    saturated-fat_100g
 ------------------------------------------------


NaN       91218
0.000     68736
0.100      5355
3.570      3487
0.500      3302
          ...  
15.020        1
92.200        1
0.928         1
40.900        1
0.556         1
Name: saturated-fat_100g, Length: 2198, dtype: int64


 >    butyric-acid_100g
 ------------------------------------------------


NaN    320772
Name: butyric-acid_100g, dtype: int64


 >    caproic-acid_100g
 ------------------------------------------------


NaN    320772
Name: caproic-acid_100g, dtype: int64


 >    caprylic-acid_100g
 ------------------------------------------------


NaN    320771
7.4         1
Name: caprylic-acid_100g, dtype: int64


 >    capric-acid_100g
 ------------------------------------------------


NaN     320770
6.20         1
5.88         1
Name: capric-acid_100g, dtype: int64


 >    lauric-acid_100g
 ------------------------------------------------


NaN         320768
49.30000         1
46.20000         1
0.04473          1
49.00000         1
Name: lauric-acid_100g, dtype: int64


 >    myristic-acid_100g
 ------------------------------------------------


NaN     320771
18.9         1
Name: myristic-acid_100g, dtype: int64


 >    palmitic-acid_100g
 ------------------------------------------------


NaN    320771
8.1         1
Name: palmitic-acid_100g, dtype: int64


 >    stearic-acid_100g
 ------------------------------------------------


NaN    320771
3.0         1
Name: stearic-acid_100g, dtype: int64


 >    arachidic-acid_100g
 ------------------------------------------------


NaN       320748
7.300          3
12.900         2
12.800         2
6.300          2
14.400         2
13.100         2
6.500          1
6.200          1
15.400         1
12.700         1
0.064          1
14.000         1
13.000         1
13.600         1
7.200          1
13.300         1
15.200         1
Name: arachidic-acid_100g, dtype: int64


 >    behenic-acid_100g
 ------------------------------------------------


NaN     320749
7.1          3
12.6         3
13.2         2
12.5         2
5.5          2
13.3         2
5.2          1
12.7         1
6.9          1
12.9         1
12.8         1
14.3         1
5.6          1
12.4         1
14.6         1
Name: behenic-acid_100g, dtype: int64


 >    lignoceric-acid_100g
 ------------------------------------------------


NaN    320772
Name: lignoceric-acid_100g, dtype: int64


 >    cerotic-acid_100g
 ------------------------------------------------


NaN    320772
Name: cerotic-acid_100g, dtype: int64


 >    montanic-acid_100g
 ------------------------------------------------


NaN     320771
61.0         1
Name: montanic-acid_100g, dtype: int64


 >    melissic-acid_100g
 ------------------------------------------------


NaN    320772
Name: melissic-acid_100g, dtype: int64


 >    monounsaturated-fat_100g
 ------------------------------------------------


NaN      297949
0.00       5757
66.67       630
7.14        591
8.93        441
          ...  
11.29         1
3.79          1
18.20         1
1.74          1
0.07          1
Name: monounsaturated-fat_100g, Length: 1067, dtype: int64


 >    polyunsaturated-fat_100g
 ------------------------------------------------


NaN      297913
0.00       6320
3.57        586
10.00       516
1.79        488
          ...  
10.60         1
5.48          1
10.59         1
3.72          1
7.31          1
Name: polyunsaturated-fat_100g, Length: 946, dtype: int64


 >    omega-3-fat_100g
 ------------------------------------------------


NaN      319931
2.000        44
1.800        31
1.000        23
2.300        22
          ...  
1.870         1
1.437         1
0.170         1
0.244         1
3.350         1
Name: omega-3-fat_100g, Length: 226, dtype: int64


 >    alpha-linolenic-acid_100g
 ------------------------------------------------


NaN      320586
0.200         8
7.000         7
0.100         6
0.048         5
          ...  
0.039         1
0.076         1
0.460         1
0.545         1
0.110         1
Name: alpha-linolenic-acid_100g, Length: 113, dtype: int64


 >    eicosapentaenoic-acid_100g
 ------------------------------------------------


NaN       320734
0.200          7
0.600          3
0.500          2
1.200          2
0.700          2
0.196          1
0.533          1
0.900          1
1.000          1
85.000         1
0.721          1
0.666          1
0.277          1
0.050          1
0.970          1
0.061          1
0.116          1
0.416          1
18.000         1
0.553          1
1.600          1
0.100          1
0.400          1
0.180          1
0.090          1
1.020          1
0.240          1
Name: eicosapentaenoic-acid_100g, dtype: int64


 >    docosahexaenoic-acid_100g
 ------------------------------------------------


NaN       320694
0.500          6
0.800          4
3.500          4
4.600          3
3.800          3
0.132          3
0.126          3
1.000          3
0.900          2
3.300          2
3.000          2
4.800          2
3.600          2
0.061          2
0.064          2
12.000         1
0.270          1
0.200          1
0.360          1
3.200          1
0.120          1
3.400          1
1.095          1
0.494          1
0.045          1
0.528          1
1.024          1
0.044          1
0.075          1
0.381          1
0.162          1
0.600          1
0.400          1
3.440          1
0.150          1
1.400          1
3.700          1
0.041          1
4.500          1
1.700          1
0.080          1
0.110          1
0.950          1
3.100          1
1.090          1
0.250          1
0.999          1
0.047          1
0.240          1
0.747          1
Name: docosahexaenoic-acid_100g, dtype: int64


 >    omega-6-fat_100g
 ------------------------------------------------


NaN     320584
1.1         11
23.0         8
19.0         7
7.6          7
         ...  
15.8         1
26.0         1
38.4         1
16.0         1
32.0         1
Name: omega-6-fat_100g, Length: 114, dtype: int64


 >    linoleic-acid_100g
 ------------------------------------------------


NaN       320623
0.500          5
23.000         5
4.200          4
0.437          4
           ...  
1.200          1
0.422          1
0.531          1
0.620          1
0.227          1
Name: linoleic-acid_100g, Length: 111, dtype: int64


 >    arachidonic-acid_100g
 ------------------------------------------------


NaN      320764
0.061         2
0.007         1
0.047         1
0.090         1
0.064         1
0.082         1
0.044         1
Name: arachidonic-acid_100g, dtype: int64


 >    gamma-linolenic-acid_100g
 ------------------------------------------------


NaN       320748
0.1700         4
0.1500         3
0.1300         3
0.1270         3
0.1600         3
0.2000         3
0.1630         1
0.0950         1
0.1100         1
0.2032         1
0.1400         1
Name: gamma-linolenic-acid_100g, dtype: int64


 >    dihomo-gamma-linolenic-acid_100g
 ------------------------------------------------


NaN         320749
0.066929         4
0.078740         3
0.062992         3
0.051181         3
0.050000         3
0.059055         3
0.043307         1
0.064000         1
0.080000         1
0.055118         1
Name: dihomo-gamma-linolenic-acid_100g, dtype: int64


 >    omega-9-fat_100g
 ------------------------------------------------


NaN      320751
29.00         2
75.00         2
70.00         2
27.00         2
31.00         1
68.00         1
1.00          1
68.80         1
7.70          1
27.10         1
1.10          1
61.00         1
26.00         1
53.00         1
50.00         1
8.35          1
39.00         1
Name: omega-9-fat_100g, dtype: int64


 >    oleic-acid_100g
 ------------------------------------------------


NaN      320759
11.00         1
6.90          1
21.40         1
40.50         1
70.00         1
10.80         1
15.70         1
5.17          1
1.08          1
8.45          1
53.70         1
5.90          1
76.00         1
Name: oleic-acid_100g, dtype: int64


 >    elaidic-acid_100g
 ------------------------------------------------


NaN    320772
Name: elaidic-acid_100g, dtype: int64


 >    gondoic-acid_100g
 ------------------------------------------------


NaN         320758
0.000001         8
0.000001         4
0.000003         2
Name: gondoic-acid_100g, dtype: int64


 >    mead-acid_100g
 ------------------------------------------------


NaN    320772
Name: mead-acid_100g, dtype: int64


 >    erucic-acid_100g
 ------------------------------------------------


NaN    320772
Name: erucic-acid_100g, dtype: int64


 >    nervonic-acid_100g
 ------------------------------------------------


NaN    320772
Name: nervonic-acid_100g, dtype: int64


 >    trans-fat_100g
 ------------------------------------------------


NaN      177474
0.00     140297
3.57         90
0.20         78
2.50         76
          ...  
1.53          1
7.69          1
2.61          1
10.87         1
4.05          1
Name: trans-fat_100g, Length: 429, dtype: int64


 >    cholesterol_100g
 ------------------------------------------------


NaN        176682
0.00000     89441
0.07100      2462
0.10700      2237
0.01200      1909
            ...  
0.03650         1
0.02340         1
0.00161         1
0.06210         1
0.48800         1
Name: cholesterol_100g, Length: 538, dtype: int64


 >    carbohydrates_100g
 ------------------------------------------------


NaN      77184
0.00     21607
3.57      4240
50.00     3167
6.67      2950
         ...  
12.97        1
27.74        1
40.56        1
65.49        1
46.24        1
Name: carbohydrates_100g, Length: 5417, dtype: int64


 >    sugars_100g
 ------------------------------------------------


NaN      75801
0.00     37077
3.57      7148
0.50      4589
3.33      3706
         ...  
81.90        1
48.08        1
56.27        1
32.95        1
81.48        1
Name: sugars_100g, Length: 4069, dtype: int64


 >    sucrose_100g
 ------------------------------------------------


NaN     320700
0.0          8
26.0         4
7.5          4
0.3          3
23.0         3
8.0          3
0.1          3
4.7          2
5.9          2
30.0         1
0.2          1
7.0          1
5.5          1
37.0         1
8.4          1
9.5          1
2.2          1
17.0         1
92.8         1
27.9         1
14.0         1
12.0         1
0.9          1
34.0         1
8.3          1
8.2          1
13.0         1
10.8         1
0.8          1
10.7         1
9.8          1
6.7          1
9.0          1
24.9         1
15.6         1
13.9         1
8.8          1
10.0         1
9.9          1
6.0          1
11.9         1
3.1          1
7.4          1
18.0         1
20.1         1
16.0         1
16.6         1
43.5         1
1.8          1
Name: sucrose_100g, dtype: int64


 >    glucose_100g
 ------------------------------------------------


NaN      320746
0.20          7
0.00          4
1.40          2
0.10          2
0.30          1
0.37          1
1.20          1
2.00          1
0.38          1
6.00          1
23.20         1
8.10          1
23.00         1
0.90          1
5.00          1
Name: glucose_100g, dtype: int64


 >    fructose_100g
 ------------------------------------------------


NaN      320734
0.0           4
55.1          3
0.2           2
63.9          2
0.1           2
3.0           2
25.0          2
2.4           1
101.0         1
60.0          1
0.9           1
35.1          1
43.0          1
1.1           1
2.3           1
70.0          1
100.0         1
6.0           1
28.0          1
26.0          1
31.5          1
18.0          1
1.5           1
29.0          1
69.0          1
0.3           1
8.0           1
1.3           1
Name: fructose_100g, dtype: int64


 >    lactose_100g
 ------------------------------------------------


NaN      320510
0.00        121
0.10         27
2.98         21
0.50          9
          ...  
2.00          1
49.90         1
30.30         1
42.70         1
6.22          1
Name: lactose_100g, Length: 66, dtype: int64


 >    maltose_100g
 ------------------------------------------------


NaN     320768
22.0         1
0.1          1
39.2         1
36.0         1
Name: maltose_100g, dtype: int64


 >    maltodextrins_100g
 ------------------------------------------------


NaN     320761
1.5          2
14.8         1
9.0          1
10.3         1
13.5         1
1.8          1
27.5         1
20.0         1
17.8         1
16.1         1
Name: maltodextrins_100g, dtype: int64


 >    starch_100g
 ------------------------------------------------


NaN     320506
0.0         24
71.0        10
50.0         9
0.5          7
         ...  
35.7         1
1.6          1
66.4         1
34.8         1
47.4         1
Name: starch_100g, Length: 139, dtype: int64


 >    polyols_100g
 ------------------------------------------------


NaN     320358
96.0        20
97.0        17
0.0         14
98.0         8
         ...  
23.2         1
58.5         1
9.6          1
18.9         1
9.3          1
Name: polyols_100g, Length: 207, dtype: int64


 >    fiber_100g
 ------------------------------------------------


NaN      119886
0.00      68833
3.60       8525
3.30       3991
1.80       3886
          ...  
7.34          1
2.01          1
4.74          1
3.74          1
87.50         1
Name: fiber_100g, Length: 1017, dtype: int64


 >    proteins_100g
 ------------------------------------------------


NaN      60850
0.00     53631
7.14      5706
25.00     4026
3.33      3852
         ...  
30.99        1
24.65        1
15.83        1
25.82        1
60.71        1
Name: proteins_100g, Length: 2504, dtype: int64


 >    casein_100g
 ------------------------------------------------


NaN      320745
1.40          5
2.90          5
7.40          2
1.10          1
7.20          1
4.80          1
3.90          1
4.95          1
7.00          1
4.90          1
9.30          1
10.20         1
5.20          1
10.70         1
0.92          1
6.70          1
8.40          1
4.20          1
Name: casein_100g, dtype: int64


 >    serum-proteins_100g
 ------------------------------------------------


NaN    320756
0.3         5
5.8         2
1.9         2
5.2         1
0.4         1
3.0         1
2.0         1
2.6         1
4.6         1
5.4         1
Name: serum-proteins_100g, dtype: int64


 >    nucleotides_100g
 ------------------------------------------------


NaN       320763
0.0240         2
0.0220         2
0.0230         1
0.0155         1
0.0180         1
0.0216         1
0.0250         1
Name: nucleotides_100g, dtype: int64


 >    salt_100g
 ------------------------------------------------


NaN          65262
0.000000     34174
0.010000      3692
0.100000      3467
1.000000      2231
             ...  
16.140000        1
0.000847         1
0.156000         1
52.613560        1
7.015480         1
Name: salt_100g, Length: 5587, dtype: int64


 >    sodium_100g
 ------------------------------------------------


NaN          65309
0.000000     34131
0.003937      3687
0.039370      3451
0.393701      2216
             ...  
2.103000         1
2.327000         1
0.009750         1
26.860000        1
20.312000        1
Name: sodium_100g, Length: 5292, dtype: int64


 >    alcohol_100g
 ------------------------------------------------


NaN     316639
0.0       1555
12.5       187
40.0       170
5.0        160
         ...  
36.6         1
71.0         1
49.0         1
36.0         1
54.2         1
Name: alcohol_100g, Length: 151, dtype: int64


 >    vitamin-a_100g
 ------------------------------------------------


NaN         183218
0.000000     78620
0.000321      2604
0.000214      2333
0.000107      1825
             ...  
0.000364         1
0.001967         1
0.001347         1
0.001633         1
0.001035         1
Name: vitamin-a_100g, Length: 2346, dtype: int64


 >    beta-carotene_100g
 ------------------------------------------------


NaN          320738
0.140000          2
0.001600          2
0.000000          2
0.240000          2
0.001000          2
0.085000          1
0.000700          1
0.260000          1
0.002900          1
0.006000          1
0.002400          1
0.001100          1
0.004250          1
0.000500          1
0.060000          1
0.000300          1
0.380000          1
0.280000          1
0.074900          1
0.000812          1
0.053000          1
0.067000          1
15.000000         1
0.071000          1
0.311000          1
0.004522          1
0.003200          1
0.002530          1
0.200000          1
Name: beta-carotene_100g, dtype: int64


 >    vitamin-d_100g
 ------------------------------------------------


NaN             313715
1.050000e-06      1627
0.000000e+00       345
3.325000e-06       319
4.250000e-07       235
                 ...  
1.975000e-06         1
2.500000e-07         1
2.142500e-05         1
3.650000e-06         1
2.100000e-05         1
Name: vitamin-d_100g, Length: 353, dtype: int64


 >    vitamin-e_100g
 ------------------------------------------------


NaN        319432
0.00180       148
0.01200        68
0.00360        37
0.01000        34
            ...  
0.03100         1
0.00810         1
0.00042         1
0.02100         1
0.01820         1
Name: vitamin-e_100g, Length: 273, dtype: int64


 >    vitamin-k_100g
 ------------------------------------------------


NaN         319854
0.000000        46
0.000029        27
0.000029        23
0.000006        22
             ...  
0.000039         1
0.000022         1
0.000044         1
0.001297         1
0.000007         1
Name: vitamin-k_100g, Length: 308, dtype: int64


 >    vitamin-c_100g
 ------------------------------------------------


NaN        179905
0.00000     90500
0.00100      1359
0.00400      1225
0.00430      1219
            ...  
0.08940         1
0.12100         1
0.12800         1
0.16250         1
0.00942         1
Name: vitamin-c_100g, Length: 1171, dtype: int64


 >    vitamin-b1_100g
 ------------------------------------------------


NaN         309618
0.00500        559
0.00400        494
0.00300        492
0.00800        464
             ...  
1.21600          1
0.00078          1
0.30300          1
0.00220          1
67.30800         1
Name: vitamin-b1_100g, Length: 686, dtype: int64


 >    vitamin-b2_100g
 ------------------------------------------------


NaN         309957
0.004554       527
0.000000       361
0.014167       350
0.455357       319
             ...  
0.140496         1
0.004380         1
0.124390         1
0.229730         1
0.764045         1
Name: vitamin-b2_100g, Length: 1072, dtype: int64


 >    vitamin-pp_100g
 ------------------------------------------------


NaN         309043
0.007143       866
0.005357       587
0.016667       334
0.002857       295
             ...  
0.017600         1
0.040708         1
0.001633         1
0.025100         1
0.000294         1
Name: vitamin-pp_100g, Length: 865, dtype: int64


 >    vitamin-b6_100g
 ------------------------------------------------


NaN         313988
0.001667       333
0.000083       258
0.000385       166
0.000286       152
             ...  
0.000566         1
0.000432         1
0.001480         1
0.000165         1
0.000156         1
Name: vitamin-b6_100g, Length: 606, dtype: int64


 >    vitamin-b9_100g
 ------------------------------------------------


NaN         315532
0.000056       527
0.000030       512
0.000028       290
0.000032       137
             ...  
0.000202         1
0.000360         1
0.000018         1
0.003509         1
0.000016         1
Name: vitamin-b9_100g, Length: 385, dtype: int64


 >    folates_100g
 ------------------------------------------------


NaN         317730
0.000214       341
0.000179       158
0.000000       124
0.000029       111
             ...  
0.000124         1
0.000225         1
0.000183         1
0.000054         1
0.000022         1
Name: folates_100g, Length: 261, dtype: int64


 >    vitamin-b12_100g
 ------------------------------------------------


NaN             315472
5.000000e-06       319
2.500000e-07       286
1.250000e-06       190
2.500000e-06       174
                 ...  
1.730000e-06         1
2.340000e-06         1
5.210000e-07         1
8.400000e-06         1
1.380000e-06         1
Name: vitamin-b12_100g, Length: 397, dtype: int64


 >    biotin_100g
 ------------------------------------------------


NaN         320442
0.000008        50
0.000041        25
0.000015        11
0.000050        11
             ...  
0.000008         1
0.000013         1
0.001200         1
0.000179         1
0.000002         1
Name: biotin_100g, Length: 96, dtype: int64


 >    pantothenic-acid_100g
 ------------------------------------------------


NaN             318289
4.170000e-04       237
0.000000e+00        97
1.429000e-03        95
5.000000e-03        76
                 ...  
8.000000e-07         1
4.580000e-03         1
1.200000e-02         1
3.850000e-03         1
7.692300e-02         1
Name: pantothenic-acid_100g, Length: 467, dtype: int64


 >    silica_100g
 ------------------------------------------------


NaN         320734
0.001500         7
0.003500         2
0.003200         2
0.003170         2
0.003270         2
0.000820         2
0.000750         2
0.001912         2
0.007100         2
0.036000         1
0.002700         1
0.004850         1
0.000890         1
0.004940         1
0.014000         1
0.001130         1
0.000008         1
0.015000         1
0.032700         1
0.250000         1
0.007700         1
0.036200         1
0.007600         1
0.027000         1
Name: silica_100g, dtype: int64


 >    bicarbonate_100g
 ------------------------------------------------


NaN       320691
0.0360         6
0.0135         3
0.0312         2
0.0074         2
           ...  
0.1350         1
0.2400         1
0.0305         1
0.0262         1
0.0250         1
Name: bicarbonate_100g, Length: 67, dtype: int64


 >    potassium_100g
 ------------------------------------------------


NaN        296024
0.00000       775
0.18800       601
0.16700       374
0.10000       321
            ...  
1.30400         1
0.59200         1
1.17400         1
0.00338         1
0.47300         1
Name: potassium_100g, Length: 1188, dtype: int64


 >    chloride_100g
 ------------------------------------------------


NaN        320614
0.05000         8
0.00068         6
0.00150         5
0.05200         4
            ...  
0.00046         1
0.53100         1
0.58900         1
0.25000         1
0.07200         1
Name: chloride_100g, Length: 100, dtype: int64


 >    calcium_100g
 ------------------------------------------------


NaN       179722
0.0000     51048
0.0710      5621
0.0670      3026
0.7140      3024
           ...  
0.7090         1
0.9720         1
0.0176         1
0.6180         1
1.1840         1
Name: calcium_100g, Length: 1124, dtype: int64


 >    phosphorus_100g
 ------------------------------------------------


NaN       314927
0.1430       255
0.5360       245
0.0080       241
0.3570       236
           ...  
0.5650         1
2.8900         1
0.0588         1
0.3610         1
0.0322         1
Name: phosphorus_100g, Length: 570, dtype: int64


 >    iron_100g
 ------------------------------------------------


NaN        180310
0.00000     42678
0.00129      6502
0.00257      4237
0.00240      2525
            ...  
0.00745         1
0.02840         1
0.00414         1
0.00788         1
0.00510         1
Name: iron_100g, Length: 1138, dtype: int64


 >    magnesium_100g
 ------------------------------------------------


NaN        314519
0.00700       288
0.01000       259
0.14300       225
0.05700       212
            ...  
0.00450         1
0.34500         1
0.00186         1
3.54000         1
1.23100         1
Name: magnesium_100g, Length: 607, dtype: int64


 >    zinc_100g
 ------------------------------------------------


NaN        316843
0.01250       262
0.00107       119
0.01172        84
0.00536        83
            ...  
0.08333         1
0.06000         1
0.00106         1
0.00480         1
0.00510         1
Name: zinc_100g, Length: 416, dtype: int64


 >    copper_100g
 ------------------------------------------------


NaN         318666
0.001000       110
0.001071       108
0.001429        98
0.000571        90
             ...  
0.000059         1
0.000377         1
0.000088         1
0.000510         1
0.001571         1
Name: copper_100g, Length: 310, dtype: int64


 >    manganese_100g
 ------------------------------------------------


NaN         319152
0.000000       494
0.002000       451
0.001000       337
0.003000        99
             ...  
0.000855         1
0.015000         1
0.000115         1
0.000630         1
0.002140         1
Name: manganese_100g, Length: 108, dtype: int64


 >    fluoride_100g
 ------------------------------------------------


NaN             320693
1.500000e-05         9
6.000000e-05         6
1.000000e-05         5
2.000000e-05         5
5.000000e-05         4
4.500000e-04         4
2.500000e-02         3
4.000000e-05         3
5.000000e-04         3
1.200000e-04         2
1.000000e-03         2
0.000000e+00         1
3.100000e-02         1
1.000000e-07         1
2.700000e-05         1
7.700000e-04         1
1.400000e-04         1
1.550000e-04         1
4.960000e-04         1
1.700000e-04         1
2.000000e-03         1
1.040000e-05         1
1.200000e-03         1
4.000000e-04         1
1.300000e-05         1
2.200000e-04         1
3.000000e-04         1
1.000000e-04         1
2.300000e-01         1
3.300000e-05         1
7.000000e-05         1
5.600000e-01         1
9.000000e-06         1
1.600000e-04         1
4.810000e-04         1
9.000000e-05         1
4.600000e-04         1
2.300000e-04         1
2.700000e-06         1
1.300000e-04         1
3.900000e-05         1
5.000000e-02         1
4.900000e-0


 >    selenium_100g
 ------------------------------------------------


NaN         319604
0.000004        61
0.000003        60
0.000005        58
0.000002        54
             ...  
0.000230         1
0.030000         1
0.000035         1
0.000115         1
0.001935         1
Name: selenium_100g, Length: 161, dtype: int64


 >    chromium_100g
 ------------------------------------------------


NaN         320752
0.000011         3
0.000008         2
0.000019         1
0.000045         1
0.000007         1
0.003010         1
0.030000         1
0.000100         1
0.000020         1
0.000267         1
0.000083         1
0.000021         1
0.000014         1
0.000026         1
0.000051         1
0.000063         1
0.000030         1
Name: chromium_100g, dtype: int64


 >    molybdenum_100g
 ------------------------------------------------


NaN         320761
0.000039         2
0.000005         2
0.000007         1
0.000032         1
0.000039         1
0.000333         1
0.000045         1
0.003760         1
0.000104         1
Name: molybdenum_100g, dtype: int64


 >    iodine_100g
 ------------------------------------------------


NaN         320513
0.001750        17
0.000015        13
0.000012        12
0.000010        12
             ...  
0.000005         1
0.000022         1
0.000011         1
0.000059         1
0.000072         1
Name: iodine_100g, Length: 112, dtype: int64


 >    caffeine_100g
 ------------------------------------------------


NaN         320694
0.00000         10
0.02000         10
0.03200          8
0.02100          4
0.01800          3
4.00000          3
0.03330          2
0.01200          2
0.80000          2
0.01700          2
0.08000          1
20.00000         1
0.00958          1
0.02900          1
0.09000          1
0.03800          1
0.06300          1
8.42000          1
0.00400          1
1.00000          1
2.80000          1
0.04400          1
0.05200          1
33.30000         1
0.02800          1
0.03100          1
42.28000         1
1.25000          1
0.05000          1
0.01000          1
0.04700          1
0.00970          1
0.00783          1
0.01900          1
0.01070          1
0.00300          1
0.03300          1
0.01500          1
0.25000          1
0.01850          1
0.04000          1
0.02500          1
Name: caffeine_100g, dtype: int64


 >    taurine_100g
 ------------------------------------------------


NaN       320743
0.4000         5
0.0350         5
0.4230         2
0.0395         2
0.0360         2
0.0390         2
0.0330         1
0.4170         1
0.0400         1
0.0410         1
0.3030         1
0.0370         1
0.0290         1
0.0380         1
0.0320         1
0.0018         1
0.0053         1
Name: taurine_100g, dtype: int64


 >    ph_100g
 ------------------------------------------------


NaN       320723
7.2000         9
6.0000         4
7.6000         3
7.7000         3
6.8000         3
6.6000         3
7.0000         3
7.5000         3
7.4000         2
7.9000         1
6.9300         1
8.2000         1
5.1000         1
5.4000         1
0.0064         1
6.7000         1
7.3200         1
6.3000         1
0.0050         1
0.0000         1
6.2000         1
5.8500         1
8.4000         1
7.3400         1
0.0078         1
Name: ph_100g, dtype: int64


 >    fruits-vegetables-nuts_100g
 ------------------------------------------------


NaN      317736
0.0         962
50.0        305
100.0       181
60.0         73
          ...  
16.8          1
60.5          1
84.6          1
41.3          1
93.9          1
Name: fruits-vegetables-nuts_100g, Length: 334, dtype: int64


 >    collagen-meat-protein-ratio_100g
 ------------------------------------------------


NaN     320607
15.0        97
12.0        43
25.0        18
18.0         3
20.0         3
8.0          1
Name: collagen-meat-protein-ratio_100g, dtype: int64


 >    cocoa_100g
 ------------------------------------------------


NaN     319824
30.0       150
70.0        85
50.0        48
52.0        48
         ...  
6.0          1
86.0         1
62.0         1
61.0         1
88.0         1
Name: cocoa_100g, Length: 85, dtype: int64


 >    chlorophyl_100g
 ------------------------------------------------


NaN    320772
Name: chlorophyl_100g, dtype: int64


 >    carbon-footprint_100g
 ------------------------------------------------


NaN      320504
0.0          20
828.0         5
160.0         4
150.0         4
          ...  
248.2         1
527.0         1
142.7         1
810.0         1
161.0         1
Name: carbon-footprint_100g, Length: 203, dtype: int64


 >    nutrition-score-fr_100g
 ------------------------------------------------


 NaN     99562
 0.0     12763
 1.0     11268
 14.0    11253
 2.0     10604
 13.0     8827
-1.0      8804
 12.0     8658
 11.0     8653
 3.0      7857
 15.0     7529
 10.0     6965
 20.0     6902
 16.0     6687
 21.0     6416
 9.0      6374
 4.0      6163
-2.0      6161
 19.0     6122
 17.0     6063
 18.0     5748
 8.0      5170
-3.0      5108
-6.0      4925
 5.0      4848
 6.0      4653
 23.0     4551
 22.0     4455
-4.0      4412
 24.0     4139
 7.0      4136
-5.0      4106
 25.0     2879
 26.0     2623
 27.0     1694
-7.0       950
 28.0      680
-8.0       601
-9.0       315
 29.0      279
 30.0      207
-10.0      159
 33.0      105
-11.0       90
 31.0       79
 32.0       73
-12.0       46
 35.0       36
-13.0       23
 34.0       20
 36.0       17
-14.0        5
 40.0        4
 37.0        3
-15.0        1
 38.0        1
Name: nutrition-score-fr_100g, dtype: int64


 >    nutrition-score-uk_100g
 ------------------------------------------------


 NaN     99562
 0.0     13588
 1.0     11932
 2.0     11083
 14.0    10689
-1.0      8827
 13.0     8409
 12.0     8239
 11.0     8093
 3.0      7620
 20.0     7390
 15.0     6921
 21.0     6650
 16.0     6509
 10.0     6504
 19.0     6476
-2.0      6260
-3.0      6116
 9.0      6085
 17.0     6078
 18.0     5790
 4.0      5687
 8.0      4945
-6.0      4926
-4.0      4819
 23.0     4715
 22.0     4632
 5.0      4555
 6.0      4334
 24.0     4282
-5.0      4211
 7.0      3961
 25.0     2887
 26.0     2643
 27.0     1698
-7.0       963
 28.0      672
-8.0       602
-9.0       315
 29.0      272
 30.0      192
-10.0      157
 33.0      101
-11.0       90
 31.0       76
 32.0       64
-12.0       46
 35.0       34
-13.0       23
 34.0       20
 36.0       17
-14.0        5
 40.0        3
 37.0        2
-15.0        1
 38.0        1
Name: nutrition-score-uk_100g, dtype: int64


 >    glycemic-index_100g
 ------------------------------------------------


NaN    320772
Name: glycemic-index_100g, dtype: int64


 >    water-hardness_100g
 ------------------------------------------------


NaN    320772
Name: water-hardness_100g, dtype: int64

In [13]:
clean_data = raw_data

clean_data = clean_data.dropna(axis='columns', how='all')
# Display data types and empty values
clean_data.info()

clean_data = clean_data.dropna(axis='index', how='all')
# Display data types and empty values
clean_data.info()

clean_data = clean_data.dropna(axis='columns', how='all')
# Display data types and empty values
clean_data.info()

clean_data = clean_data.dropna(axis='index', how='all')
# Display data types and empty values
clean_data.info()

clean_data = clean_data.dropna(axis='columns', how='all')
# Display data types and empty values
clean_data.info()

clean_data = clean_data.dropna(axis='index', how='all')
# Display data types and empty values
clean_data.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 320772 entries, 0 to 320771
Columns: 146 entries, code to nutrition-score-uk_100g
dtypes: float64(90), object(56)
memory usage: 357.3+ MB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 320772 entries, 0 to 320771
Columns: 146 entries, code to nutrition-score-uk_100g
dtypes: float64(90), object(56)
memory usage: 359.8+ MB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 320772 entries, 0 to 320771
Columns: 146 entries, code to nutrition-score-uk_100g
dtypes: float64(90), object(56)
memory usage: 359.8+ MB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 320772 entries, 0 to 320771
Columns: 146 entries, code to nutrition-score-uk_100g
dtypes: float64(90), object(56)
memory usage: 359.8+ MB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 320772 entries, 0 to 320771
Columns: 146 entries, code to nutrition-score-uk_100g
dtypes: float64(90), object(56)
memory usage: 359.8+ MB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 320772 ent

In [None]:

# display first 5 rows
clean_data.head()


In [18]:

raw_data = pd.read_csv(csv_local_path, 
    sep='\t',
    usecols=
        lambda column_name: 
            column_name not in ['code', 'url', 'creator', 'created_t', 'created_datetime', 'last_modified_t', 'last_modified_datetime',
                'image_url', 'image_small_url', 'nutrition-score-uk_100g ']
)

# display first 5 rows
raw_data.head()


  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,product_name,generic_name,quantity,packaging,packaging_tags,brands,brands_tags,categories,categories_tags,categories_fr,...,ph_100g,fruits-vegetables-nuts_100g,collagen-meat-protein-ratio_100g,cocoa_100g,chlorophyl_100g,carbon-footprint_100g,nutrition-score-fr_100g,nutrition-score-uk_100g,glycemic-index_100g,water-hardness_100g
0,Farine de blé noir,,1kg,,,Ferme t'y R'nao,ferme-t-y-r-nao,,,,...,,,,,,,,,,
1,Banana Chips Sweetened (Whole),,,,,,,,,,...,,,,,,,14.0,14.0,,
2,Peanuts,,,,,Torn & Glasser,torn-glasser,,,,...,,,,,,,0.0,0.0,,
3,Organic Salted Nut Mix,,,,,Grizzlies,grizzlies,,,,...,,,,,,,12.0,12.0,,
4,Organic Polenta,,,,,Bob's Red Mill,bob-s-red-mill,,,,...,,,,,,,,,,


In [19]:

# Display data types and empty values
raw_data.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 320772 entries, 0 to 320771
Columns: 153 entries, product_name to water-hardness_100g
dtypes: float64(106), object(47)
memory usage: 374.4+ MB


In [20]:

# Display statistical summary of each column
raw_data.describe(include="all")


Unnamed: 0,product_name,generic_name,quantity,packaging,packaging_tags,brands,brands_tags,categories,categories_tags,categories_fr,...,ph_100g,fruits-vegetables-nuts_100g,collagen-meat-protein-ratio_100g,cocoa_100g,chlorophyl_100g,carbon-footprint_100g,nutrition-score-fr_100g,nutrition-score-uk_100g,glycemic-index_100g,water-hardness_100g
count,303010,52795,104819,78960,78961,292360,292352,84410,84389,84411,...,49.0,3036.0,165.0,948.0,0.0,268.0,221210.0,221210.0,0.0,0.0
unique,221347,38584,13826,14547,12064,58784,50253,36982,21142,21152,...,,,,,,,,,,
top,Ice Cream,Pâtes alimentaires au blé dur de qualité supér...,500 g,Carton,"sachet,plastique",Carrefour,carrefour,"Snacks sucrés,Biscuits et gâteaux,Biscuits","en:sugary-snacks,en:biscuits-and-cakes,en:bisc...","Snacks sucrés,Biscuits et gâteaux,Biscuits",...,,,,,,,,,,
freq,410,201,4669,2153,3959,2978,3149,301,802,802,...,,,,,,,,,,
mean,,,,,,,,,,,...,6.425698,31.458587,15.412121,49.547785,,341.700764,9.165535,9.058049,,
std,,,,,,,,,,,...,2.047841,31.967918,3.753028,18.757932,,425.211439,9.055903,9.183589,,
min,,,,,,,,,,,...,0.0,0.0,8.0,6.0,,0.0,-15.0,-15.0,,
25%,,,,,,,,,,,...,6.3,0.0,12.0,32.0,,98.75,1.0,1.0,,
50%,,,,,,,,,,,...,7.2,23.0,15.0,50.0,,195.75,10.0,9.0,,
75%,,,,,,,,,,,...,7.4,51.0,15.0,64.25,,383.2,16.0,16.0,,


Nous voyons que, pour chaque arbre listé, nous disposons des informations suivantes (la description des colonnes est disponible sur le site [OpenData](https://opendata.paris.fr/explore/dataset/les-arbres/information/)) :
- `id` : simple identifiant de l'arbre (entier, ex. : `99874`)
- `type_emplacement` : type de l'emplacement (texte, ex. : `"Arbre"`)
- `domanialite` : type de lieu auquel appartient l'arbre (texte, ex. : `"Jardin"`)
- `arrondissement` : arrondissement de Paris où est situé l'arbre (texte, ex. : `"PARIS 7E ARRDT"`)
- `complement_addresse` : complement d'adress (texte, pas d'exemple visible)
- `numero` : numéro de l'adress (texte, pas d'exemple visible)
- `lieu` : adresse de l'arbre (texte, ex. : `"MAIRIE DU 7E 116 RUE DE GRENELLE PARIS 7E"`)
- `id_emplacement` : identifiant de l'emplacement (texte, ex. : `"19"`)
- `libelle_francais` : nom commun (vernaculaire) de l'espèce de l'arbre (texte, ex. : `"Marronnier"`)
- `genre` : genre de l'arbre (texte, ex. : `"Aesculus"`)
- `espece` : espèce de l'arbre (texte, ex. : `"hippocastanum"`)
- `variete` : variété de l'arbre (texte, pas d'exemple visible)
- `circonference_cm` : circonférence en centimètres de l'arbre (entier, ex. : `20`)
- `hauteur_m` : taille en mètres de l'arbre (entier, ex. : `5`)
- `stade_developpement` : stade de développement de l'arbre (texte, ex. : `"A"` pour "Adulte")
- `remarquable` : si l'arbre est "remarquable" ou non (booléen, ex. : `0` pour un arbre "non remarquable")
- `geo_point_2d_a` : latitude de la position de l'arbre (nombre à virgule, ex. : `48.857620`)
- `geo_point_2d_b` : longitude de la position de l'arbre (nombre à virgule, ex. : `2.320962`)

Nous voyons déjà que parmis les quelques premières données :
- un certain certain nombre de valeurs ne sont pas fournies (`NaN` = "Not a Number" = donnée non disponible)
- nous pouvons classer les variables selon leur type :
    - quantitatives
        - discrètes : `id`, `circonference_cm`, `hauteur_m`
        - continues : `geo_point_2d_a`, `geo_point_2d_b`
    - qualitatives
        - nominales : `type_emplacement`, `domanialite`, `arrondissement`, `complement_addresse`, `numero`, `lieu`, `id_emplacement`, `libelle_francais`, `genre`, `espece`, `variete`
        - ordinales : `stade_developpement`, `remarquable`
- on peut aussi les classer en trois grandes catégories, d'après leur sens :
    - métadonnées internes au système : `id`, `id_emplacement`, `type_emplacement`
    - données de localisation : `arrondissement`, `complement_addresse`, `numero`, `lieu`, `geo_point_2d_a`, `geo_point_2d_b`
    - données de description : 
        - taille : `circonference_cm`, `hauteur_m` et `stade_developpement`
        - type : `libelle_francais`, `genre`, `espece` et `variete`
        - autre : `remarquable`


Nous allons observer plus précisément les types de valeurs et les valeurs vides :

Nous voyons ici que la plupart des arbres sont des platanes adultes.
Cette information permet d'optimiser les achats et le stockage du materiel et des produits adaptés spécifiquement à l'entretien de ces arbres.

---

_[Licence GPL-v3](https://github.com/fleuryc/oc_ingenieur-ia_P2-Participez-a-un-concours-sur-la-Smart-City/blob/main/LICENSE)_
