---
title: Summary Functions and Maps
tags: [jupyter]
keywords: pandas
summary: "Extracting insights from data using functions and maps."
mlType: dataFrame
infoType: pandas
sidebar: pandas_sidebar
permalink: __AutoGenThis__
notebookfilename:  __AutoGenThis__
---

In [1]:
import sys

sys.path.append("../")

In [2]:
import pandas as pd
from pprint import pprint

# Padas Options

In [7]:
pd.set_option('max_rows', 8)

# I/O

In [4]:
reviews = pd.read_csv("../data/winemag-data-130k-v2.csv", index_col=0)

In [5]:
reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


# Summary Functions

Pandas provides many simple **summary functions** which restructures the data in some useful ways

## describe()

This method generates a high-level summary of the attributes of the given coloumn data.  It is **type-aware** meaning that the output changes based on the data type of the input.  The output only makes sense for numerical data but we can use string as well.

In [8]:
reviews.points.describe()

count    129971.000000
mean         88.447138
std           3.039730
min          80.000000
25%          86.000000
50%          88.000000
75%          91.000000
max         100.000000
Name: points, dtype: float64

In [9]:
reviews.taster_name.describe()

count         103727
unique            19
top       Roger Voss
freq           25514
Name: taster_name, dtype: object

This is very useful to get statistics from DataDrame or Series.

## mean()

Sometimes it is useful to obtain the mean of a coloumn data.  We can use the mean method to a coloumn.

In [10]:
reviews.points.mean()

88.44713820775404

## unique()

If we are dealing with list of string we can get the unique method.

In [11]:
reviews.country.unique()

array(['Italy', 'Portugal', 'US', 'Spain', 'France', 'Germany',
       'Argentina', 'Chile', 'Australia', 'Austria', 'South Africa',
       'New Zealand', 'Israel', 'Hungary', 'Greece', 'Romania', 'Mexico',
       'Canada', nan, 'Turkey', 'Czech Republic', 'Slovenia',
       'Luxembourg', 'Croatia', 'Georgia', 'Uruguay', 'England',
       'Lebanon', 'Serbia', 'Brazil', 'Moldova', 'Morocco', 'Peru',
       'India', 'Bulgaria', 'Cyprus', 'Armenia', 'Switzerland',
       'Bosnia and Herzegovina', 'Ukraine', 'Slovakia', 'Macedonia',
       'China', 'Egypt'], dtype=object)

## value_counts()

Sometimes it is useful to see the counts of these unique items and we can use the values_counts method

In [12]:
reviews.country.value_counts()

US          54504
France      22093
Italy       19540
Spain        6645
            ...  
Armenia         2
Slovakia        1
China           1
Egypt           1
Name: country, Length: 43, dtype: int64

# Maps

A **map** is a term, borrowed from mathematics, for a function that takes one set of values and "maps" them to another set of values.  We often have a need for creating new representations from existing data, or for transforming data from the format it is now to the format that we want it to be later.  This is done using **maps**.

There are two methods that are often used.

For example if we want to remean the wine scores to 0.

## map()

In [16]:
reviewPointMean = reviews.points.mean()
reviews.points.map(lambda p:p-reviewPointMean)

0        -1.447138
1        -1.447138
2        -1.447138
3        -1.447138
            ...   
129967    1.552862
129968    1.552862
129969    1.552862
129970    1.552862
Name: points, Length: 129971, dtype: float64

In the above example we use a dummy function called lambda to shift the distribution around 0.

Note that the function you pass to **map()** should expect a single value from the Series  and return a transformed version of that value. map() returns a new Series where all the values have been transformed by your function.

### Using your own function in map

You can also use your own function by using a lambda dummy function to pass the value of the series to your own function and do some analysis.

In [21]:
def myReMean(x,meanVal):
    return x-meanVal

revPntMean = reviews.points.mean()
reviews.points.map(lambda x:myReMean(x,revPntMean))
    

0        -1.447138
1        -1.447138
2        -1.447138
3        -1.447138
            ...   
129967    1.552862
129968    1.552862
129969    1.552862
129970    1.552862
Name: points, Length: 129971, dtype: float64

## apply()

To achive this we can also use the apply method where we can pass our own method.

apply() is the equivalent method if we want to **transform a whole DataFrame** by calling a custom method on each row.

In [23]:
def remean_points(row):
    row.points = row.points - revPntMean
    return row

reviews.apply(remean_points, axis='columns')

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,-1.447138,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,-1.447138,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,-1.447138,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,-1.447138,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129967,US,Citation is given as much as a decade of bottl...,,1.552862,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,France,Well-drained gravel soil gives this wine its c...,Kritt,1.552862,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,1.552862,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss
129970,France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline,1.552862,21.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...,Gewürztraminer,Domaine Schoffit


If we had called **reviews.apply()** with **axis='index'** instead of passing a function to transform each row, we would need to give a function to transform each col.