## Pandas series as index objects

### Analysing columns / pandas series

In [1]:
import pandas as pd

In [2]:
cars = pd.read_csv(r'C:\Users\shiv\Documents\GitHub\acies-training\12-feb\cars.csv')
cars.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin,name
0,18.0,8,307.0,130.0,3504,12.0,70,usa,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693,11.5,70,usa,buick skylark 320
2,18.0,8,318.0,150.0,3436,11.0,70,usa,plymouth satellite
3,16.0,8,304.0,150.0,3433,12.0,70,usa,amc rebel sst
4,17.0,8,302.0,140.0,3449,10.5,70,usa,ford torino


In [3]:
cars.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 398 entries, 0 to 397
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   mpg           398 non-null    float64
 1   cylinders     398 non-null    int64  
 2   displacement  398 non-null    float64
 3   horsepower    392 non-null    float64
 4   weight        398 non-null    int64  
 5   acceleration  398 non-null    float64
 6   model_year    398 non-null    int64  
 7   origin        398 non-null    object 
 8   name          398 non-null    object 
dtypes: float64(4), int64(3), object(2)
memory usage: 28.1+ KB


Select the numerical column "mpg", create a copy and save the column/Pandas Series in the variable mpg

In [4]:
mpg = cars.mpg.copy()

In [5]:
mpg.head

<bound method NDFrame.head of 0      18.0
1      15.0
2      18.0
3      16.0
4      17.0
       ... 
393    27.0
394    44.0
395    32.0
396    28.0
397    31.0
Name: mpg, Length: 398, dtype: float64>

Get some summary statistics on the Series mpg! What is the mpg of the least fuel efficient car?

In [8]:
mpg.describe() #9.0 is mpg of least fuel efficient car

count    398.000000
mean      23.514573
std        7.815984
min        9.000000
25%       17.500000
50%       23.000000
75%       29.000000
max       46.600000
Name: mpg, dtype: float64

Get the maximum Value in the Series mpg by explicitly calling the ... method! The most fuel efficient car has a mpg of...?

In [9]:
mpg.max()

46.6

Get the Frequency/Counts of all unique values in the Series mpg! What is the most frequent value?

In [11]:
mpg.value_counts() #13 is the most frequent value

13.0    20
14.0    19
18.0    17
15.0    16
26.0    14
        ..
34.2     1
38.1     1
37.2     1
32.1     1
32.7     1
Name: mpg, Length: 129, dtype: int64

Get the relative frequencies in the Series mpg! What is the relative frequency of the most frequent value?

In [13]:
mpg.value_counts(normalize = True) #0.0502 is the relative freq

13.0    0.050251
14.0    0.047739
18.0    0.042714
15.0    0.040201
26.0    0.035176
          ...   
34.2    0.002513
38.1    0.002513
37.2    0.002513
32.1    0.002513
32.7    0.002513
Name: mpg, Length: 129, dtype: float64

Sort the Series mpg from low to high! What is the second lowest value?

In [14]:
mpg.sort_values(ascending = True).iloc[1]

10.0

Sort the Series mpg from high to low and save the changes by setting the inplace parameter to True

In [15]:
mpg.sort_values(ascending = False, inplace = True)

Inspect the first 5 elements of mpg! What is the second highest value?

In [19]:
mpg.iloc[1]

44.6

In [17]:
mpg.head()

322    46.6
329    44.6
325    44.3
394    44.0
326    43.4
Name: mpg, dtype: float64

Sort the Series mpg by the Index and save the changes!

In [20]:
mpg.sort_index(inplace = True)

In [21]:
mpg.head()

0    18.0
1    15.0
2    18.0
3    16.0
4    17.0
Name: mpg, dtype: float64

Miles per Gallon (mpg) can be transformed into Liter per 100 Kilometer with the following formula:
Liter per 100 Kilometer = 235.21 / mpg

Create a new Pandas Series l_per_100 by applying the above formula! Round the results to 2 decimals

In [22]:
l_per_100 = (235.21/mpg).round(2)

Run and Inspect. What is the very first element?

In [23]:
l_per_100.iloc[0]

13.07

Get some summary statistics on the Series l_per_100! What is the average value?

In [25]:
l_per_100.describe() #avg is 11.212

count    398.000000
mean      11.212789
std        3.901407
min        5.050000
25%        8.110000
50%       10.230000
75%       13.440000
max       26.130000
Name: mpg, dtype: float64

Select the non-numerical column "origin", create a copy and save the column/Pandas Series in the variable origin!

In [27]:
origin = cars.origin.copy()

Inspect! The first 5 elements are all...?

In [28]:
origin.head()  #usa

0    usa
1    usa
2    usa
3    usa
4    usa
Name: origin, dtype: object

Call the describe() method on the non-numerical Series "origin"! What is the most frequent value?

In [29]:
origin.describe() #usa

count     398
unique      3
top       usa
freq      249
Name: origin, dtype: object

Get all unique values in the Series origin! Apart from the value usa, there are also the values...?

In [30]:
origin.unique()  #japan and europe

array(['usa', 'japan', 'europe'], dtype=object)

Last but not least, count the frequencies in the Series origin! How often does the value europe appear?

In [33]:
origin.value_counts()

usa       249
japan      79
europe     70
Name: origin, dtype: int64