# Find the best profile for electric car motor temperatures


First things first, the imports and the data:

In [None]:
import cufflinks as cf
import plotly.offline
import pandas as pd
import numpy as np

cf.go_offline()
cf.set_config_file(offline=False, world_readable=True)

In [None]:
data = pd.read_csv("pmsm_temperature_data.csv") 
data.head()

## Column Description

Before we get to know our data better lets first clear what all this features mean:

**ambient:** Ambient temperature as measured by a thermal sensor located closely to the stator.  
**coolant:** Coolant temperature. The motor is water cooled. Measurement is taken at outflow.  
**u_d:** Voltage d-component  
**u_q:** Voltage q-component  
**motor_speed:** Motor speed  
**torque:** Torque induced by c urrent.  
**i_d:** Current d-component  
**i_q:** Current q-component  
**pm:** Permanent Magnet surface temperature representing the rotor temperature. This was measured with an infrared thermography unit.  
**stator_yoke:** Stator yoke temperature measured with a thermal sensor.  
**stator_tooth:** Stator tooth temperature measured with a thermal sensor.  
**stator_winding:** Stator winding temperature measured with a thermal sensor.  
**profile_id:** Each measurement session has a unique ID. Make sure not to try to estimate from one session onto the other as they are  

## Goal

Get a overview of the coorelation of the data which features make sense to keep which are redundant/ have a lot na values. Final goal is it to find the best settings how our motor can be most energie efficient, lets see how far we can get!

## Visual inspection


### Histogram

Probably one of the easiest visualisation, a histogram of the three *main* features:


In [None]:
data["ambient"].iplot(kind="histogram", bins=1, theme="white", title="ambient",xTitle='ambient', yTitle='Count')

In [None]:
data["coolant"].iplot(kind="histogram", bins=1, theme="white", title="coolant",xTitle='coolant', yTitle='Count')

In [None]:
data["motor_speed"].iplot(kind="histogram", bins=1, theme="white", title="motor_speed",xTitle='motor_speed', yTitle='Count')

The values of the data do not appear to make a lot of sense to me, they might have been scaled using normalization, we can check that easy by using the build in pandas function *describe*, this will give us among other things the mean of the data which should be 0 and the standard deviation which should be somewhere 1:

In [None]:
data.describe().round(2)

Exactly what I expected, the mean of all values (except the profile_id) is zero, and the standard deviation (std in the table above) is 1.

### Heatmap

Lets start with a simple heatmap, this will give us some first insights in the correlation of the data:

In [None]:
data.corr().iplot(kind='heatmap',colorscale="blues", title="Feature Correlation Matrix")


### Distplot

Next lets give a distplot a try, this gives us a deeper insights in the correlation between the single datapoints!

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

plt.rcParams.update({'font.size': 22})

g = sns.pairplot(data, size=2.5)
for i, j in zip(*np.triu_indices_from(g.axes, 1)):
    g.axes[i, j].set_visible(False)

## PCA



In [None]:
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Standardize the data to have a mean of ~0 and a variance of 1
X_std = StandardScaler().fit_transform(data)
# Create a PCA instance: pca
pca = PCA(n_components=13)
principalComponents = pca.fit_transform(X_std)
# Plot the explained variances
features = range(pca.n_components_)
plt.bar(features, pca.explained_variance_ratio_, color='black')
plt.xlabel('PCA features')
plt.ylabel('variance %')
plt.xticks(features)
# Save components to a DataFrame
PCA_components = pd.DataFrame(principalComponents)


We  see a big drop after the first **three features** and then again a smaller, but anyway bigger as normal, drop after **5** features, lets take make another check with another Method:

In [None]:
from sklearn.cluster import KMeans

data_cluster = data
n_cluster = range(1, 13)
kmeans = [KMeans(n_clusters=i).fit(data_cluster) for i in n_cluster]
scores = [kmeans[i].score(data_cluster) for i in range(len(kmeans))]

fig, ax = plt.subplots(figsize=(10,6))
ax.plot(n_cluster, scores)
plt.xlabel('Number of Clusters')
plt.ylabel('Score')
plt.title('Elbow Curve')
plt.show();


The *elbow point* which is the important point to look at is here as well at **three features**, interesting to see is that the difference from features 4 to 5 is not visible here the same as in the PCA analysis.

Anyway, lets take a look at our data:

In [None]:
plt.scatter(PCA_components[0], PCA_components[1], alpha=.05, color='black')
plt.xlabel('PCA 1')
plt.ylabel('PCA 2')

No clear structure is visible in 2 Dimensions, lets try three:

In [None]:
from mpl_toolkits.mplot3d import Axes3D 

plt.rcParams.update({'font.size': 12})


fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

ax.scatter(PCA_components[0], PCA_components[1], PCA_components[2], alpha=.05)

plt.show()

No patterns as well... maybe a *4th dimension* will give us some clearness:

In [None]:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

x = PCA_components[0]
y = PCA_components[1]
z = PCA_components[2]
c = PCA_components[3]

img = ax.scatter(x, y, z, c=c, cmap=plt.hot())
fig.colorbar(img)
plt.show()