Day 11: How do I change the data type of a pandas Series?

In [3]:
import pandas as pd

In [4]:
drinks = pd.read_csv('https://bit.ly/drinksbycountry')

In [6]:
drinks.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


In order to check the data type of the columns we can use the dtype attributes.
Here we have two columns that are of type object which is string type.

In [13]:
drinks.dtypes

country                          object
beer_servings                     int64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
continent                        object
beers_servings                  float64
dtype: object

If we want to convert the beer servings from int to float. Using the astype will not change the column unless you over right it. 

In [15]:
drinks['beer_servings'] = drinks.beer_servings.astype(float)

In [16]:
drinks.dtypes

country                          object
beer_servings                   float64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
continent                        object
beers_servings                  float64
dtype: object

How to define type of each column before actually reading the csv?
- We can change the data type at the time of reading the csv file.

In [17]:
drinks = pd.read_csv('https://bit.ly/drinksbycountry', dtype={'beer_servings': float})

In [18]:
drinks.dtypes

country                          object
beer_servings                   float64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
continent                        object
dtype: object

In [19]:
orders = pd.read_table('https://bit.ly/chiporders')

In [20]:
orders.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98


In [21]:
orders.dtypes

order_id               int64
quantity               int64
item_name             object
choice_description    object
item_price            object
dtype: object

For formating the column, we can use the replace method, and cast it a numerical type.

In [22]:
orders.item_price.str.replace('$', '').astype(float).mean()

7.464335785374297

Using the contains method can identify the presence of chicken, but in order to convert these to numerical 
value in order to be further used in ML models we can convert it 0 or 1 values using astype(int) 

In [24]:
orders.item_name.str.contains('Chicken').astype(int).head()

0    0
1    0
2    0
3    0
4    1
Name: item_name, dtype: int64