# Tutorial 3 - Python For Data Analysis 🐍

---

## Pandas

### *Table of Contents*

- Pandas 🐼
  - [Introduction](#intro)
  - [Exercice 1](#exercise-5) : Explore Pandas DataFrame basic functions
  - [Exercise 2](#exercise-6) : Explore traffic accidents vehicles

In [1]:
import pandas as pd


## Pandas 🐼

<a name="intro">

### Introduction - Traffic Accidents Database
---

https://www.data.gouv.fr/fr/datasets/base-de-donnees-accidents-corporels-de-la-circulation/

For each personal injury accident, information describing the accident is entered by the law enforcement unit that intervened at the scene of the accident. These seizures are brought together in a sheet called the personal accident analysis report. All of these files constitute the national file of traffic accidents known as the "BAAC File" administered by the National Intermenstrual Road Safety Observatory "ONISR".

The databases, extracted from the BAAC file, list all of the bodily injury accidents occurring during a specific year in mainland France, in the overseas departments (Guadeloupe, Guyana, Martinique, Réunion and Mayotte since 2012) and in the other overseas territories (Saint-Pierre- et-Miquelon, Saint-Barthélemy, Saint-Martin, Wallis-et-Futuna, French Polynesia and New Caledonia; available only from 2019 in open data) with a simplified description. This includes accident location information, as entered as well as information regarding the characteristics of the accident and its location, the vehicles involved and their victims.

The databases from 2005 to 2019 are now annual and made up of 4 files (Characteristics - Locations - Vehicles - Users) in csv format.

Links to databases:
- https://www.data.gouv.fr/fr/datasets/r/be2191a6-a7cd-446f-a9fc-8d698688eb9e
- https://www.data.gouv.fr/fr/datasets/r/e4c6f4fe-7c68-4a1d-9bb6-b0f1f5d45526
- https://www.data.gouv.fr/fr/datasets/r/08b77510-39c4-4761-bf02-19457264790f
- https://www.data.gouv.fr/fr/datasets/r/96aadc9f-0b55-4e9a-a70e-c627ed97e6f7
   
SETUP : We’ll use vehicles database.

Import the data using pandas library.`pd.read_csv(url)`

In [2]:
df = pd.read_csv("https://www.data.gouv.fr/fr/datasets/r/be2191a6-a7cd-446f-a9fc-8d698688eb9e")

<a name="exercise-5">

### Exercise 1 : Explore Pandas DataFrame basic functions
---

1) How many columns do we have in the DataFrame ? How many rows do we have ?

In [3]:
print("nb colonnes : ", df.shape[1])
print("nb lignes : ",df.shape[0])

nb colonnes :  9
nb lignes :  101924


2) What are the types of columns in the DataFrame ? Use `info()` function.

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101924 entries, 0 to 101923
Data columns (total 9 columns):
 #   Column   Non-Null Count   Dtype  
---  ------   --------------   -----  
 0   Num_Acc  101924 non-null  int64  
 1   senc     101852 non-null  float64
 2   catv     101924 non-null  int64  
 3   occutc   101924 non-null  int64  
 4   obs      101886 non-null  float64
 5   obsm     101874 non-null  float64
 6   choc     101913 non-null  float64
 7   manv     101917 non-null  float64
 8   num_veh  101924 non-null  object 
dtypes: float64(5), int64(3), object(1)
memory usage: 7.0+ MB


3) Calculate the mean of numeric columns.

In [7]:
mean_values = df.mean()

print(mean_values)

Num_Acc    2.016000e+11
senc       1.041433e+00
catv       1.204036e+01
occutc     7.061144e-02
obs        8.706299e-01
obsm       1.594941e+00
choc       2.857947e+00
manv       5.703896e+00
dtype: float64


  mean_values = df.mean()


4) Calculate the maximum and minimum value of numeric column.

In [8]:
print('maximum value = ', df.max(numeric_only= True))
print('minimum value = ', df.min(numeric_only= True))

maximum value =  Num_Acc    2.016001e+11
senc       2.000000e+00
catv       9.900000e+01
occutc     3.000000e+02
obs        1.600000e+01
obsm       9.000000e+00
choc       9.000000e+00
manv       2.400000e+01
dtype: float64
minimum value =  Num_Acc    2.016000e+11
senc       0.000000e+00
catv       1.000000e+00
occutc     0.000000e+00
obs        0.000000e+00
obsm       0.000000e+00
choc       0.000000e+00
manv       0.000000e+00
dtype: float64


5) Use `describe()` function, what can you observe ?

In [9]:
df.describe()

Unnamed: 0,Num_Acc,senc,catv,occutc,obs,obsm,choc,manv
count,101924.0,101852.0,101924.0,101924.0,101886.0,101874.0,101913.0,101917.0
mean,201600000000.0,1.041433,12.040363,0.070611,0.87063,1.594941,2.857947,5.703896
std,17118.04,0.747403,11.028127,2.221603,2.931908,1.145935,2.476565,7.042847
min,201600000000.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
25%,201600000000.0,0.0,7.0,0.0,0.0,1.0,1.0,1.0
50%,201600000000.0,1.0,7.0,0.0,0.0,2.0,2.0,1.0
75%,201600000000.0,2.0,10.0,0.0,0.0,2.0,4.0,13.0
max,201600100000.0,2.0,99.0,300.0,16.0,9.0,9.0,24.0


Answer :

6) Display the first 30 values, then last 30.

In [63]:
df.iloc[0:30]



Unnamed: 0,Num_Acc,senc,catv,occutc,obs,obsm,choc,manv,num_veh,is_public_transport
0,201600000001,0.0,7,0,0.0,0.0,1.0,1.0,B02,False
1,201600000001,0.0,2,0,0.0,0.0,7.0,15.0,A01,False
2,201600000002,0.0,7,0,6.0,0.0,1.0,1.0,A01,False
3,201600000003,0.0,7,0,0.0,1.0,6.0,1.0,A01,False
4,201600000004,0.0,32,0,0.0,0.0,1.0,1.0,B02,False
5,201600000004,0.0,7,0,0.0,0.0,8.0,15.0,A01,False
6,201600000005,0.0,30,0,0.0,2.0,1.0,15.0,B02,False
7,201600000005,0.0,7,0,0.0,2.0,3.0,1.0,A01,False
8,201600000006,0.0,7,0,0.0,1.0,1.0,1.0,A01,False
9,201600000007,1.0,30,0,0.0,0.0,3.0,15.0,A01,False


In [64]:
df.tail(30)

Unnamed: 0,Num_Acc,senc,catv,occutc,obs,obsm,choc,manv,num_veh,is_public_transport
101894,201600059412,2.0,32,0,0.0,2.0,1.0,1.0,B01,False
101895,201600059413,1.0,30,0,14.0,0.0,8.0,15.0,A01,False
101896,201600059414,1.0,7,0,0.0,2.0,8.0,1.0,A01,False
101897,201600059414,2.0,30,0,0.0,2.0,1.0,15.0,B01,False
101898,201600059415,2.0,36,0,0.0,1.0,1.0,17.0,A01,False
101899,201600059416,2.0,7,0,0.0,1.0,3.0,1.0,A01,False
101900,201600059417,1.0,7,0,0.0,2.0,1.0,1.0,A01,False
101901,201600059417,2.0,7,0,0.0,0.0,0.0,24.0,Z01,False
101902,201600059418,2.0,7,0,0.0,1.0,3.0,1.0,A01,False
101903,201600059419,1.0,7,0,0.0,1.0,0.0,0.0,Z01,False


<a name="exercise-6">

### Exercise 2 : Explore traffic accidents vehicles
---

Read the documentation to understand the meanings of each column.

Documentation: https://www.data.gouv.fr/fr/datasets/r/6cade01c-f69d-4779-b0a4-20606069888f

1) Then display the column names and rename it with the names of your choice. For the following questions, we use these column names:
```
[
"Num_Acc", "sens_de_Circulation", "catV", "nb_occupants", "obstacle_fixe", "obstacle_mobile", "choc", "manoeuvre" , "num_vehicule"
]
```

In [10]:
# Original column names and their corresponding new names
rename_dict = {
    "Num_Acc": "Num_Acc",
    "circ": "sens_de_Circulation",
    "catr": "catV",
    "nbv": "nb_occupants",
    "pr": "obstacle_fixe",
    "pr1": "obstacle_mobile",
    "vosp": "choc",
    "prof": "manoeuvre",
    "v1": "num_vehicule"
}

# Rename columns in the DataFrame
df.rename(columns=rename_dict, inplace=True)

# Display the renamed columns
print(df.columns)


Index(['Num_Acc', 'senc', 'catv', 'occutc', 'obs', 'obsm', 'choc', 'manv',
       'num_veh'],
      dtype='object')


2) Get the number of null values by column. Then, display its proportion in the DataFrame.
For example : 0.08% of col_1 are null

In [12]:
# Calculate the number of null values by column
null_counts = df.isnull().sum()

# Calculate the proportion of null values by column
null_proportions = (df.isnull().sum() / len(df)) * 100

# Display the results
for column, null_count in null_counts.items():
    proportion = null_proportions[column]
    print(f"{proportion:.2f}% of {column} are null")


0.00% of Num_Acc are null
0.07% of senc are null
0.00% of catv are null
0.00% of occutc are null
0.04% of obs are null
0.05% of obsm are null
0.01% of choc are null
0.01% of manv are null
0.00% of num_veh are null


#### Exploratory Data Analysis - Filters

3) Select `obstacle fixe` column, what is the type of returned value ?

In [15]:
# Select the 'obs' column
obs_column = df['obs']

# Determine the type of the returned value
column_type = type(obs_column)

print(column_type)


<class 'pandas.core.series.Series'>


4) Convert the returned value to a NumPy array

In [17]:
import numpy as np

# Select the 'obs' column
obs_column = df['obs']

# Convert the Series to a NumPy array
obs_array = obs_column.values

# Verify the type of the returned value
print(type(obs_array))


<class 'numpy.ndarray'>


5) Compute the maximum value of the NumPy array

In [62]:
max_value = obs_array.max()
print(max_value)


nan


6) Select `catV` & `sens_de_Circulation columns together

In [19]:
selected_columns = df[['catv', 'senc']]
selected_columns


Unnamed: 0,catv,senc
0,7,0.0
1,2,0.0
2,7,0.0
3,7,0.0
4,32,0.0
...,...,...
101919,30,1.0
101920,30,2.0
101921,30,1.0
101922,2,1.0


7) Select all the columns containing ‘obstacle’ in their name

In [20]:
df[[column for column in df.columns if 'obstacle' in column]]

0
1
2
3
4
...
101919
101920
101921
101922
101923


8) Select all rows with null values for 'sens_de_Circulation' column

In [22]:
rows_with_null_senc = df[df['senc'].isnull()]
rows_with_null_senc


Unnamed: 0,Num_Acc,senc,catv,occutc,obs,obsm,choc,manv,num_veh
4153,201600002519,,1,0,0.0,2.0,8.0,2.0,B02
4504,201600002732,,7,0,0.0,2.0,8.0,1.0,A01
8407,201600005135,,7,0,0.0,2.0,2.0,9.0,A01
12686,201600007810,,7,0,0.0,2.0,2.0,19.0,B02
14845,201600009146,,1,0,,,,,B01
...,...,...,...,...,...,...,...,...,...
74516,201600044276,,7,0,0.0,2.0,3.0,1.0,A01
74612,201600044328,,7,0,0.0,2.0,3.0,1.0,A01
74771,201600044419,,7,0,0.0,2.0,3.0,1.0,B01
74814,201600044448,,7,0,0.0,2.0,0.0,1.0,Z01


9) Sort the data frame using the accident number column

In [23]:
sorted_df = df.sort_values(by='Num_Acc')
sorted_df


Unnamed: 0,Num_Acc,senc,catv,occutc,obs,obsm,choc,manv,num_veh
0,201600000001,0.0,7,0,0.0,0.0,1.0,1.0,B02
1,201600000001,0.0,2,0,0.0,0.0,7.0,15.0,A01
2,201600000002,0.0,7,0,6.0,0.0,1.0,1.0,A01
3,201600000003,0.0,7,0,0.0,1.0,6.0,1.0,A01
4,201600000004,0.0,32,0,0.0,0.0,1.0,1.0,B02
...,...,...,...,...,...,...,...,...,...
101919,201600059430,1.0,30,0,0.0,2.0,1.0,17.0,B01
101920,201600059431,2.0,30,0,0.0,2.0,7.0,15.0,A01
101921,201600059431,1.0,30,0,0.0,2.0,1.0,1.0,B01
101922,201600059432,1.0,2,0,0.0,2.0,7.0,1.0,A01


10) Sort the values using the accident number column, then the vehicle category column

In [61]:
sorted_df = df.sort_values(by=['Num_Acc', 'catv'])
print(sorted_df)

             Num_Acc  senc  catv  occutc  obs  obsm  choc  manv num_veh  \
1       201600000001   0.0     2       0  0.0   0.0   7.0  15.0     A01   
0       201600000001   0.0     7       0  0.0   0.0   1.0   1.0     B02   
2       201600000002   0.0     7       0  6.0   0.0   1.0   1.0     A01   
3       201600000003   0.0     7       0  0.0   1.0   6.0   1.0     A01   
5       201600000004   0.0     7       0  0.0   0.0   8.0  15.0     A01   
...              ...   ...   ...     ...  ...   ...   ...   ...     ...   
101919  201600059430   1.0    30       0  0.0   2.0   1.0  17.0     B01   
101920  201600059431   2.0    30       0  0.0   2.0   7.0  15.0     A01   
101921  201600059431   1.0    30       0  0.0   2.0   1.0   1.0     B01   
101922  201600059432   1.0     2       0  0.0   2.0   7.0   1.0     A01   
101923  201600059432   2.0     7       0  0.0   2.0   1.0   9.0     B01   

        is_public_transport  
1                     False  
0                     False  
2        

11) Select the 4153rd row. Then select row having index '4153'. What is the difference between both?

In [25]:
row_4153_by_position = df.iloc[4152]  # 0-based index, so 4152 corresponds to the 4153rd row
row_4153_by_index = df.loc[4153]


12)  Set Num_Acc as index

a. Make sure to use `inplace=True to save the value in the dataframe



In [26]:
df.set_index('Num_Acc', inplace=True)

b. Can you still get the row having index '4153'? Explain.



In [27]:
# unless '4153' is an accident number in the 'Num_Acc' column.


c. Finally, Restore the index as it was before.

In [66]:
df.reset_index(inplace=True)

#### Exploratory Data Analysis - Queries

13) Select accident that have a vehicle number W23

In [60]:
accidents_with_W23 = df[df['num_veh'] == 'W23']
print(accidents_with_W23)

           Num_Acc  senc  catv  occutc  obs  obsm  choc  manv num_veh  \
4073  201600002473   2.0     7       0  0.0   0.0   0.0   1.0     W23   

      is_public_transport  
4073                False  


14) Select the top 5 accidents that caused the most damage

In [59]:
top_5_damages = df.sort_values(by='choc', ascending=False).head(5)
print(top_5_damages)

            Num_Acc  senc  catv  occutc   obs  obsm  choc  manv num_veh  \
80633  201600047347   2.0     7       0   0.0   2.0   9.0   2.0     C01   
44309  201600026191   2.0    33       0   5.0   0.0   9.0  15.0     A01   
2356   201600001452   2.0    33       0   0.0   0.0   9.0  13.0     A01   
21756  201600013123   2.0     7       0  12.0   0.0   9.0   1.0     A01   
57227  201600033917   0.0     7       0  14.0   0.0   9.0   0.0     A01   

       is_public_transport  
80633                False  
44309                False  
2356                 False  
21756                False  
57227                False  


15) Count the number of damaged vehicles, by vehicle category and number of occupants

In [58]:
damaged_vehicles_count = df.groupby(['catv', 'occutc']).size()
print(damaged_vehicles_count)

catv  occutc
1     0          4705
2     0          3424
3     0           436
7     0         64641
10    0          5584
                ...  
40    140           1
      150           2
      200           1
      210           1
99    0           202
Length: 121, dtype: int64


16) From the previous result, select category vehicle '40', having '140' occupants

In [57]:
selected_category = damaged_vehicles_count.loc[40, 140]
print(selected_category)

1


17) Calculate the number of occupants by vehicle category

In [55]:
occupants_by_category = df.groupby('catv')['occutc'].sum()
print(occupants_by_category)

catv
1        0
2        0
3        0
7        0
10       0
13       0
14       0
15       0
16       0
17       0
20       0
21       0
30       0
31       0
32       0
33       0
34       0
35       0
36       0
37    2743
38    1385
39     522
40    2547
99       0
Name: occutc, dtype: int64


18) Compute the number of damages per vehicle category

In [56]:
damages_per_category = df.groupby('catv')['choc'].sum()
print(damages_per_category)

catv
1      12779.0
2       8035.0
3       1264.0
7     189569.0
10     16836.0
13      1168.0
14      3154.0
15      3128.0
16       130.0
17      2085.0
20       312.0
21       672.0
30      9918.0
31      6244.0
32      5678.0
33     22229.0
34      3998.0
35        45.0
36       308.0
37      2195.0
38       744.0
39        49.0
40       300.0
99       422.0
Name: choc, dtype: float64


19) Select all the lines with a category vehicle between 37 and 40 inclusive. (These values correspond to public transport)

In [54]:
public_transport_rows = df[(df['catv'] >= 37) & (df['catv'] <= 40)]
print(public_transport_rows)

             Num_Acc  senc  catv  occutc  obs  obsm  choc  manv num_veh  \
129     201600000083   1.0    39       0  0.0   2.0   3.0   1.0     B02   
224     201600000148   0.0    38       0  0.0   1.0   2.0   1.0     A01   
278     201600000184   0.0    37       0  1.0   1.0   5.0  16.0     A01   
292     201600000193   2.0    38       0  0.0   2.0   3.0   1.0     A01   
522     201600000330   0.0    39       1  0.0   2.0   1.0   1.0     B02   
...              ...   ...   ...     ...  ...   ...   ...   ...     ...   
101351  201600059075   1.0    37       0  0.0   2.0   3.0   1.0     D01   
101367  201600059084   2.0    37       0  0.0   2.0   1.0  15.0     B01   
101406  201600059103   2.0    38       0  0.0   9.0   0.0   0.0     B01   
101444  201600059125   1.0    37       0  0.0   1.0   0.0   4.0     A01   
101787  201600059339   1.0    37       1  0.0   1.0   2.0   1.0     A01   

        is_public_transport  
129                    True  
224                    True  
278      

20) Create a new column with a Boolean value, showing whether the accident happened in public transport or not

In [35]:
df['is_public_transport'] = df['catv'].between(37, 40)

21) Check that the new column is working well. For this, it is necessary to verify that the number of occupants in vehicles excluding public transport is equal to 0. Contrary to what we can see in the data in the other group.

In [36]:
non_public_transport_with_occupants = df[(~df['is_public_transport']) & (df['occutc'] != 0)]
assert len(non_public_transport_with_occupants) == 0

22) Create a new dataframe, which contains only public transport accidents

In [53]:
public_transport_df = df[df['is_public_transport']]
print(public_transport_df)

             Num_Acc  senc  catv  occutc  obs  obsm  choc  manv num_veh  \
129     201600000083   1.0    39       0  0.0   2.0   3.0   1.0     B02   
224     201600000148   0.0    38       0  0.0   1.0   2.0   1.0     A01   
278     201600000184   0.0    37       0  1.0   1.0   5.0  16.0     A01   
292     201600000193   2.0    38       0  0.0   2.0   3.0   1.0     A01   
522     201600000330   0.0    39       1  0.0   2.0   1.0   1.0     B02   
...              ...   ...   ...     ...  ...   ...   ...   ...     ...   
101351  201600059075   1.0    37       0  0.0   2.0   3.0   1.0     D01   
101367  201600059084   2.0    37       0  0.0   2.0   1.0  15.0     B01   
101406  201600059103   2.0    38       0  0.0   9.0   0.0   0.0     B01   
101444  201600059125   1.0    37       0  0.0   1.0   0.0   4.0     A01   
101787  201600059339   1.0    37       1  0.0   1.0   2.0   1.0     A01   

        is_public_transport  
129                    True  
224                    True  
278      

23) Obtain the average number of occupants by vehicle category, and sort the result

In [38]:
avg_occupants_by_category = df.groupby('catv')['occutc'].mean().sort_values(ascending=False)

24) Map the vehicle category, obstacles (fixed and mobile) using the dictionary defined in mapping.py (you can use the mapping.py file, but you can also create your own mappings).

In [40]:
pip install mapping

Collecting mapping
  Downloading mapping-0.1.6.tar.gz (93 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/93.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m92.2/93.8 kB[0m [31m2.7 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m93.8/93.8 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: mapping
  Building wheel for mapping (setup.py) ... [?25l[?25hdone
  Created wheel for mapping: filename=mapping-0.1.6-py3-none-any.whl size=23664 sha256=b116093967eec5376c93d1efcfe416224ef5e98df273564aed94ec8d06164238
  Stored in directory: /root/.cache/pip/wheels/3c/69/56/037b78779e2ad9d35b0b2baaee1d47a149fc02d618704b369b
Successfully built mapping
Installing collected packages: mapping
Successfully installed mapping-0.1.6


In [41]:
from mapping import *

In [42]:
# df['catv'] = df['catv'].map(catv_mapping)
# df['obs'] = df['obs'].map(obs_mapping)
# df['obsm'] = df['obsm'].map(obsm_mapping)


#### Heat Map - Vehicle Category vs Accident Moving Obstacle

25) Create a new database, projecting the category of the vehicle in relation to the moving obstacle.

The values of the data frame will be the sum of the respective occupant numbers.

This database can be referred to as a heat map of the number of accidents by type of transport and obstacle.

In [47]:
heatmap_df = df.pivot_table(values='occutc', index='catv', columns='obsm', aggfunc='sum', fill_value=0)
print(heatmap_df)

obsm  0.0  1.0   2.0  4.0  5.0  6.0  9.0
catv                                    
1       0    0     0    0    0    0    0
2       0    0     0    0    0    0    0
3       0    0     0    0    0    0    0
7       0    0     0    0    0    0    0
10      0    0     0    0    0    0    0
13      0    0     0    0    0    0    0
14      0    0     0    0    0    0    0
15      0    0     0    0    0    0    0
16      0    0     0    0    0    0    0
17      0    0     0    0    0    0    0
20      0    0     0    0    0    0    0
21      0    0     0    0    0    0    0
30      0    0     0    0    0    0    0
31      0    0     0    0    0    0    0
32      0    0     0    0    0    0    0
33      0    0     0    0    0    0    0
34      0    0     0    0    0    0    0
35      0    0     0    0    0    0    0
36      0    0     0    0    0    0    0
37    520  784  1313    0    0    0  126
38    256  210   917    0    0    0    2
39    300   34   187    1    0    0    0
40    171  883  

26) Using the previous results, obtain the most dangerous mode of transport for the public.

In [45]:
most_dangerous_transport = heatmap_df.loc[37:40].sum(axis=1).idxmax()
print(most_dangerous_transport)

37


27) Sort the heatmap from the most dangerous to the least dangerous means of transport.

In [48]:
sorted_heatmap_by_transport = heatmap_df.sum(axis=1).sort_values(ascending=False)
print(sorted_heatmap_by_transport)

catv
37    2743
40    2547
38    1385
39     522
1        0
2        0
36       0
35       0
34       0
33       0
32       0
31       0
30       0
21       0
20       0
17       0
16       0
15       0
14       0
13       0
10       0
7        0
3        0
99       0
dtype: int64


28) Get the most dangerous moving obstacle when using public tranpsortation.

In [49]:
most_dangerous_obstacle = heatmap_df.loc[37:40].sum().idxmax()


29) Sort the heat map from the most dangerous to the least dangerous entity (means of transport, and moving obstacle)

In [52]:
sorted_heatmap = heatmap_df.stack().sort_values(ascending=False)
print(sorted_heatmap)

catv  obsm
40    2.0     1483
37    2.0     1313
38    2.0      917
40    1.0      883
37    1.0      784
              ... 
16    1.0        0
      2.0        0
      4.0        0
      5.0        0
99    9.0        0
Length: 168, dtype: int64
