The goal of this project is to analyze a dataset to determine if there exists a connection between driving conditions and an electric vehicle’s battery temperature. The expected model contains two inputs: the ambient temperature and the trip distance, and the output is the battery temperature. The statistical method that will be considered for this analysis is linear regression and sampling will be applied to verify the conclusions drawn from the dataset

In [1]:
# Import necessary libraries = common libraries include pandas, numpy, matplotlib, sklearn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats
import math
import statistics as stats
import statsmodels.stats.api as sms
%matplotlib inline

## **Data Cleaning and Preparation:**
1. import the data into a dataframe
2. review the data
3. determine the data set to only use for the model to be built on
4. use only the data asked in the question
5. confirm each column has all the data per column
6. if not all data in the column, make sample the data so that value count equal the other columns

In [2]:
#import data
evbattemp = pd.read_excel('EvBatTemps.xlsx')

In [3]:
#review the data
# view the first 5 records of each column
evbattemp.head()

Unnamed: 0,Trip,Date,Route/Area,Weather,Battery Temperature (Start) [°C],Battery Temperature (End),Battery State of Charge (Start),Battery State of Charge (End),Unnamed: 8,Ambient Temperature (Start) [°C],Target Cabin Temperature,Distance [km],Duration [min],Unnamed: 13,Fan,Note
0,TripA01,2019-06-25_13-21-14,Munich East,sunny,21.0,22.0,0.863,0.803,0.06,25.5,23.0,7.42769,16.82,,"Automatic, Level 1",
1,TripA02,2019-06-25_14-05-31,Munich East,sunny,23.0,26.0,0.803,0.673,0.13,32.0,23.0,23.509709,23.55,,"Automatic, Level 1",Target Cabin Temperature changed
2,TripA03,2019-06-28_10-02-15,Munich East,sunny,24.0,25.0,0.835,0.751,0.084,21.5,27.0,12.820846,11.18,,"Automatic, Level 1",Target Cabin Temperature changed
3,TripA04,2019-06-28_10-13-30,Munich East,sunny,25.0,27.0,0.751,0.667,0.084,24.0,22.0,10.727491,6.87,,"Automatic, Level 1",
4,TripA05,2019-06-28_10-20-26,Munich East,sunny,27.0,27.0,0.667,0.602,0.065,24.5,24.0,12.393223,22.776667,,"Automatic, Level 1",


In [4]:
#review the data
# confirm the data info
evbattemp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 72 entries, 0 to 71
Data columns (total 16 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Trip                              70 non-null     object 
 1   Date                              70 non-null     object 
 2   Route/Area                        70 non-null     object 
 3   Weather                           70 non-null     object 
 4   Battery Temperature (Start) [°C]  70 non-null     float64
 5   Battery Temperature (End)         70 non-null     float64
 6   Battery State of Charge (Start)   70 non-null     float64
 7   Battery State of Charge (End)     70 non-null     float64
 8   Unnamed: 8                        70 non-null     float64
 9   Ambient Temperature (Start) [°C]  70 non-null     float64
 10  Target Cabin Temperature          70 non-null     float64
 11  Distance [km]                     70 non-null     float64
 12  Duration [

- There are **72 observations** and **16 columns** in the data
- Some of the columns are of **numeric data type** while others are of **object data type**
- Though there are **72 observations** only 70 of them show up and non-null, confirm 2 rows are null and remove them
- The column "Unnamed: 13" has no values in it
- The column "Note" has only 26 observations out of the average total of 72 observations compared to the rest of the columns
- The column "Fan" has the same redundent data so it can be dropped
- Columns "Trip", "Date", "Unnamed: 13" and "Note" can be dropped because they have nothing to do with the question asked

In [5]:
# drop the 2 columns: 'Note', 'Unnamed: 13'
evbattemp = evbattemp.drop(['Trip','Date','Note', 'Unnamed: 13', 'Fan'], axis=1)

In [6]:
# confirm null rows exists
evbattemp_null_only = evbattemp[evbattemp.isna().any(axis=1)]
evbattemp_null_only

Unnamed: 0,Route/Area,Weather,Battery Temperature (Start) [°C],Battery Temperature (End),Battery State of Charge (Start),Battery State of Charge (End),Unnamed: 8,Ambient Temperature (Start) [°C],Target Cabin Temperature,Distance [km],Duration [min]
32,,,,,,,,,,,
33,,,,,,,,,,,


In [7]:
#drop null rows
evbattemp = evbattemp.dropna(how='any',axis=0)

In [8]:
# confirm null rows were removed
evbattemp.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 70 entries, 0 to 71
Data columns (total 11 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Route/Area                        70 non-null     object 
 1   Weather                           70 non-null     object 
 2   Battery Temperature (Start) [°C]  70 non-null     float64
 3   Battery Temperature (End)         70 non-null     float64
 4   Battery State of Charge (Start)   70 non-null     float64
 5   Battery State of Charge (End)     70 non-null     float64
 6   Unnamed: 8                        70 non-null     float64
 7   Ambient Temperature (Start) [°C]  70 non-null     float64
 8   Target Cabin Temperature          70 non-null     float64
 9   Distance [km]                     70 non-null     float64
 10  Duration [min]                    70 non-null     float64
dtypes: float64(9), object(2)
memory usage: 6.6+ KB


## **Exploratory Data Analysis:**

