# Results
The objective of this notebook is to analize the data that's being used in our database in order to see which statistics can be utilised.  
Lets start by importing the pandas library.

While trying to read the csv file, I received the following error:  
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 4: invalid continuation byte  

So the csv file is not compatible with the normal UTF-8 encoding.  
I found the following information on the issue: [Unicode Error](https://stackoverflow.com/questions/18171739/unicodedecodeerror-when-reading-csv-file-in-pandas-with-python)  
By using ISO-8859-1 endoding instead of UTF-8, I was able to import the csv file as a pandas dataframe.

Added results to the git repo


In [223]:
import pandas as pd

resultsCSV = 'results.csv'
resultsDF  = pd.read_csv(resultsCSV, header=0, index_col=None, encoding = "ISO-8859-1")
resultsDF

Unnamed: 0,resultId,raceId,driverId,constructorId,number,grid,position,positionText,positionOrder,points,laps,time,milliseconds,fastestLap,rank,fastestLapTime,fastestLapSpeed,statusId
0,1,18,1,1,22.0,1,1.0,1,1,10.0,58,34:50.6,5690616.0,39.0,2.0,01:27.5,218.3,1
1,2,18,2,2,3.0,5,2.0,2,2,8.0,58,5.478,5696094.0,41.0,3.0,01:27.7,217.586,1
2,3,18,3,3,7.0,7,3.0,3,3,6.0,58,8.163,5698779.0,41.0,5.0,01:28.1,216.719,1
3,4,18,4,4,5.0,11,4.0,4,4,5.0,58,17.181,5707797.0,58.0,7.0,01:28.6,215.464,1
4,5,18,5,1,23.0,3,5.0,5,5,4.0,58,18.014,5708630.0,43.0,1.0,01:27.4,218.385,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23772,23777,988,842,5,10.0,17,16.0,16,16,0.0,54,,,33.0,16.0,01:43.8,192.542,11
23773,23778,988,828,15,9.0,19,17.0,17,17,0.0,54,,,36.0,15.0,01:43.6,193.057,11
23774,23779,988,840,3,18.0,15,18.0,18,18,0.0,54,,,52.0,6.0,01:42.3,195.402,11
23775,23780,988,832,4,55.0,12,,R,19,0.0,31,,,26.0,14.0,01:43.4,193.41,36


From the dataframe above, we can see that we have the points, laps, time, milliseconds and fastestLap columns, which contains rich infomation related to the specific race results.  
There are 23776 rows, so this dataframe contains a lot of information.  

We can see the foreign keys such as raceId, driverId and constructorId, which references other datasets.

Lets see if we can find the fastest lap speed.  
This would normally be a straight forward operation with Pandas dataframes as we can just cal the max() method on the desired column, the problem is that the data under fastestLapSpeed is not in the desired format, which for us would be either int or float.  

In [224]:
resultsDF.dtypes

resultId             int64
raceId               int64
driverId             int64
constructorId        int64
number             float64
grid                 int64
position           float64
positionText        object
positionOrder        int64
points             float64
laps                 int64
time                object
milliseconds       float64
fastestLap         float64
rank               float64
fastestLapTime      object
fastestLapSpeed     object
statusId             int64
dtype: object

Above we can see that the fastestLapSpeed column is actually of type object.  
Lets first convert the entire dataframe to the most appropriate datatypes, this way we'll get rid of the object types.  
We can use the [convert_dtypes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.convert_dtypes.html#pandas.DataFrame.convert_dtypes) method to do this.

In [225]:
#resultsDF.to_numeric(resultsDF["fastestLapSpeed"], errors='raise', downcast=None)
dfn = resultsDF.convert_dtypes()
dfn.dtypes

resultId             Int64
raceId               Int64
driverId             Int64
constructorId        Int64
number               Int64
grid                 Int64
position             Int64
positionText        string
positionOrder        Int64
points             float64
laps                 Int64
time                string
milliseconds         Int64
fastestLap           Int64
rank                 Int64
fastestLapTime      string
fastestLapSpeed     string
statusId             Int64
dtype: object

We can see that the fastestLapSpeed column has now changed from an object type to string.  
That's good, since we can now convert from string to float using the [pandas to_numeric](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_numeric.html#pandas.to_numeric) method.

In [226]:
dfn["fastestLapSpeed"] = pd.to_numeric(dfn["fastestLapSpeed"], errors='coerce')

Let's just have another look at our dataframe after conversion.

In [227]:
dfn

Unnamed: 0,resultId,raceId,driverId,constructorId,number,grid,position,positionText,positionOrder,points,laps,time,milliseconds,fastestLap,rank,fastestLapTime,fastestLapSpeed,statusId
0,1,18,1,1,22,1,1,1,1,10.0,58,34:50.6,5690616,39,2,01:27.5,218.300,1
1,2,18,2,2,3,5,2,2,2,8.0,58,5.478,5696094,41,3,01:27.7,217.586,1
2,3,18,3,3,7,7,3,3,3,6.0,58,8.163,5698779,41,5,01:28.1,216.719,1
3,4,18,4,4,5,11,4,4,4,5.0,58,17.181,5707797,58,7,01:28.6,215.464,1
4,5,18,5,1,23,3,5,5,5,4.0,58,18.014,5708630,43,1,01:27.4,218.385,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23772,23777,988,842,5,10,17,16,16,16,0.0,54,,,33,16,01:43.8,192.542,11
23773,23778,988,828,15,9,19,17,17,17,0.0,54,,,36,15,01:43.6,193.057,11
23774,23779,988,840,3,18,15,18,18,18,0.0,54,,,52,6,01:42.3,195.402,11
23775,23780,988,832,4,55,12,,R,19,0.0,31,,,26,14,01:43.4,193.410,36


Now that we've converted to float, we can use the min and max method functions to extract the slowest and fastest speeds from the column of data.

In [228]:
print("Slowest Lap Speed:  " + str(dfn["fastestLapSpeed"].min()))
print("Fastest Lap Speed:  " + str(dfn["fastestLapSpeed"].max()))

Slowest Lap Speed:  89.54
Fastest Lap Speed:  257.32


The fastestLapTime is still of type string, which is not ideal when processing this data.  
We need to convert this column into a time format that we can use in order to extract meaningful data.

In [233]:
print("Variable type BEFORE conversion: ", type(dfn["fastestLapTime"][0]))
dfn["fastestLapTime"] = pd.to_datetime(dfn["fastestLapTime"], errors='raise', format= '%M:%S.%f')
print("Variable type AFTER conversion: ", type(dfn["fastestLapTime"][0]))

Variable type BEFORE conversion:  <class 'str'>
Variable type AFTER conversion:  <class 'pandas._libs.tslibs.timestamps.Timestamp'>


In [234]:
dfn

Unnamed: 0,resultId,raceId,driverId,constructorId,number,grid,position,positionText,positionOrder,points,laps,time,milliseconds,fastestLap,rank,fastestLapTime,fastestLapSpeed,statusId
0,1,18,1,1,22,1,1,1,1,10.0,58,34:50.6,5690616,39,2,1900-01-01 00:01:27.500,218.300,1
1,2,18,2,2,3,5,2,2,2,8.0,58,5.478,5696094,41,3,1900-01-01 00:01:27.700,217.586,1
2,3,18,3,3,7,7,3,3,3,6.0,58,8.163,5698779,41,5,1900-01-01 00:01:28.100,216.719,1
3,4,18,4,4,5,11,4,4,4,5.0,58,17.181,5707797,58,7,1900-01-01 00:01:28.600,215.464,1
4,5,18,5,1,23,3,5,5,5,4.0,58,18.014,5708630,43,1,1900-01-01 00:01:27.400,218.385,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23772,23777,988,842,5,10,17,16,16,16,0.0,54,,,33,16,1900-01-01 00:01:43.800,192.542,11
23773,23778,988,828,15,9,19,17,17,17,0.0,54,,,36,15,1900-01-01 00:01:43.600,193.057,11
23774,23779,988,840,3,18,15,18,18,18,0.0,54,,,52,6,1900-01-01 00:01:42.300,195.402,11
23775,23780,988,832,4,55,12,,R,19,0.0,31,,,26,14,1900-01-01 00:01:43.400,193.410,36


As can be seen above, we've managed to convert the string date in the fastestLapTime column to datetime, but it's now brought the year month and day into the data as well, which is not something we need. The original data only contained the time, so let's see if we can remove the year month and day from the data, but still retain the time format.  
We can remove the date from the datetime object, the only problem is that if we do that, we can no longer use the method functions like min and max.  

See below what the problem is.  
We'll make a copy of the dataframe so that we don't mess up the original dataframe.

In [253]:
dfn1 = dfn.copy()
dfn1["fastestLapTime"] = dfn1["fastestLapTime"].dt.time
print("Time format is now of type: ", type(dfn1["fastestLapTime"][0]))
dfn1

Time format is now of type:  <class 'datetime.time'>


Unnamed: 0,resultId,raceId,driverId,constructorId,number,grid,position,positionText,positionOrder,points,laps,time,milliseconds,fastestLap,rank,fastestLapTime,fastestLapSpeed,statusId
0,1,18,1,1,22,1,1,1,1,10.0,58,34:50.6,5690616,39,2,00:01:27.500000,218.300,1
1,2,18,2,2,3,5,2,2,2,8.0,58,5.478,5696094,41,3,00:01:27.700000,217.586,1
2,3,18,3,3,7,7,3,3,3,6.0,58,8.163,5698779,41,5,00:01:28.100000,216.719,1
3,4,18,4,4,5,11,4,4,4,5.0,58,17.181,5707797,58,7,00:01:28.600000,215.464,1
4,5,18,5,1,23,3,5,5,5,4.0,58,18.014,5708630,43,1,00:01:27.400000,218.385,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23772,23777,988,842,5,10,17,16,16,16,0.0,54,,,33,16,00:01:43.800000,192.542,11
23773,23778,988,828,15,9,19,17,17,17,0.0,54,,,36,15,00:01:43.600000,193.057,11
23774,23779,988,840,3,18,15,18,18,18,0.0,54,,,52,6,00:01:42.300000,195.402,11
23775,23780,988,832,4,55,12,,R,19,0.0,31,,,26,14,00:01:43.400000,193.410,36


From the table above, we can now see that we have the desired time format, but we can no longer use the basic method functions like min and max as it is not supported between instances of datetime.time and float as can be seen from the error below.

In [255]:
dfn1["fastestLapTime"].min()

TypeError: '<=' not supported between instances of 'datetime.time' and 'float'

So instead of removing the date from the datetime variables, lets keep it in and just print out the time.

In [256]:
print("Slowest Lap Time:  ", dfn["fastestLapTime"].max().time())
print("Fastest Lap Time:  ", dfn["fastestLapTime"].min().time())

Slowest Lap Time:   00:03:22.300000
Fastest Lap Time:   00:01:07.400000


## Conclusion
The information from this dataframe contains all the detailed information related to race statistics.  
There is a lot of data analysis that can be done on this dataset, but we also need to combine the other datasets to get a more comprehensive picture.  
This data should be used in conjunction with other datasets to construct more informative statistical patterns.