In [1]:
import pandas as pd
import numpy as np

In [2]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry')
drinks.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


In [3]:
drinks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 193 entries, 0 to 192
Data columns (total 6 columns):
country                         193 non-null object
beer_servings                   193 non-null int64
spirit_servings                 193 non-null int64
wine_servings                   193 non-null int64
total_litres_of_pure_alcohol    193 non-null float64
continent                       193 non-null object
dtypes: float64(1), int64(3), object(2)
memory usage: 9.1+ KB


Object usually means a string is being stored. But we can also store a pandas series of python lists and a pandas series of python dictionary. In other words, we can store an arbitrary python objects in a pandas series. Pandas basically just stores a reference to that object and calls it type object.

Observe - __memory usage: 9.1+ KB__ implies data uses atleast 9.1 KB of more memory. + coz as pandas object is referenced to another object and as pandas wants info method to run fast so they actually figures out how much space reference to these objects take. So it cn be more depending on the objects size.

In [4]:
drinks.info(memory_usage = 'deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 193 entries, 0 to 192
Data columns (total 6 columns):
country                         193 non-null object
beer_servings                   193 non-null int64
spirit_servings                 193 non-null int64
wine_servings                   193 non-null int64
total_litres_of_pure_alcohol    193 non-null float64
continent                       193 non-null object
dtypes: float64(1), int64(3), object(2)
memory usage: 30.4 KB


In [8]:
## Memory usage of drinks for each column
drinks.memory_usage(deep = True)

Index                              80
country                         12588
beer_servings                    1544
spirit_servings                  1544
wine_servings                    1544
total_litres_of_pure_alcohol     1544
continent                       12332
dtype: int64

In [9]:
# Total memory usage
drinks.memory_usage(deep = True).sum()

31176

So the object columns can take up large space.  

So trying to make more space efficient:
- If we store our strings as integers, as integers are more space efficient than strings.

In [10]:
drinks['continent'].unique()

array(['Asia', 'Europe', 'Africa', 'North America', 'South America',
       'Oceania'], dtype=object)

In [11]:
drinks['continent'].head()

0      Asia
1    Europe
2    Africa
3    Europe
4    Africa
Name: continent, dtype: object