## Introduction

In the first part of this unit we will look at **functions**. You could think of these as self-contained mini programmes that you write to deal with a certain problem you will repeatedly have. In the last unit we wrote a script to calculate BMI. If we wanted to incorporate a BMI calculation into other programmes we were writing we could copy the previous code. A much better approach wuld be to write a function that will automate this calculation and then use that in the new programme. Then, if things change you only have to change the function and not the other parts of your programme.

In the second part of this unit we'll look at a specific python module which can take a lot of the pain out of data analysis. This module is called [```pandas```](http://pandas.pydata.org/). The module contains many functions that carry out both mundane and more advanced tasks in data analysis and we'll see some of this in action.

##Functions

As stated in the introduction functions are small self contained routines that perform some computational task. You have already used a number of the built in functions in python. One example is the ```max()``` function which will return the maximum value of a list.

In [1]:
lst = [-4,5,-12,76,23,56,99,102,-112]
max(lst)

102

How would we go about writing a script that duplicates the functionality of ```max()```? We could simply iterate over the list with a ```for``` loop, compare each value to the current value and if the current value is bigger we keep it.

In [2]:
lst = [-4,5,-12,76,23,56,99,102,-112]
mx = None # mx is of type None initially

for n in lst:

    if mx < n:
        mx = n
        
print mx        

102


We've just written a simple for loop to identify the largest number in a list. If we wanted to repeat the process of identifying the largest number is another list somewhere else in our programme we'd have to write another loop! Wouldn't it be easier to write the loop once and then just repeatedly use the same bit of code without all the re-typing (or copy/pasting)? That's exactly what functions allow you to do.

In python the syntax for writing functions is to begin with the ```def``` keyword, then type the name of your function and in parentheses any **arguments** your function will take and a colon. All the function code, the code  that does the actual computation, is then indented (just like in loops). Once you've written the function you can **call** it from anywhere in your programme simply by typing the function name followed by relevant arguments in parentheses after the function name. As ever it's easier with an example, so let's write our own function to get the maximum value in a list.

In [3]:
def lst_mx(lst):
    mx = None
    
    for n in lst:
        
        if mx < n:
            mx = n
            
    return mx        

So in the code above we have all the building blocks.

We start with the **def**inition - the ```def``` keyword, the function name and a placeholder for the argument it will take. Note we have called the placeholder ```lst``` in this example and ```lst``` is also the name we use inside the function. Essentially we will pass a variable called ```lst``` to the inside of the function where stuff will happen.

Next we have our indented function body. This contains the code that will do whatever computation we want; in this case calculate the maximum value of the ```lst``` variable. Finally we end our function with a **return** statement. The return statement returns some value out of the function so we can use that value. In contrast if we ended our function definition with a **print** statement we would see our value being printed to the screen but we couldn't do anything further with it. Using a ```return``` statement we can assign the output of our function to a new variable.

In [4]:
another_list = [14,52,67,12,-130,46,-26]
list_max = lst_mx(another_list)
print list_max

67


In the above example notice that the ```list``` that went into the function wasn't called ```lst```; that's just a placeholder for what goes into the function. Once we have passed some value to the function it is assigned the label ```lst``` inside the function and the computation is carried out.

##Function using ```dict``` to count things
One use of ```dictionaries``` in python is to count things (usually text). How does that work and what does it have to do with functions? Suppose we were given a text file and asked to count the number of times each letter appears. You could:

* create 26 variables, one for each letter of the alphabet. Then you could traverse the string and, for each character, increment the corresponding counter, probably using a chained conditional.
* create a list with 26 elements. Then you could convert each character to a number (using the built-in function ```ord```), use the number as an index into the list, and increment the appropriate counter.
* create a dictionary with characters as keys and counters as the corresponding values. The first time you see a character, you would add an item to the dictionary. After that you would increment the value of an existing item.

These all perform the same computation but the **implementation** is different in each. An implementation is a way of performing a computation and some implementations are more optimal than others. In the suggestion above using the ```dict``` we don’t have to create room for all 26 letters (some might not appear in the string at all) - we only have to make room for the letters that do appear.

Here is some pseudocode for a function based on the ```dict``` implementation:

```
def a_function(input_string):
    create an empty dict
    for char in input_string:
        if char is not in dict:
            put the char in dict as a key with value 1
        else:
            dict[char] = dict[char]+1
     return dict       
```       
First we define our function with a ```str``` as input. In the function we create an empty ```dict```ionary. Then a ```for``` loop traverses the string. Each time through the loop, if the character ```char``` is not in the dictionary, we create a new item with key ```char``` and the initial value 1 (since we have seen this letter once). If ```char``` is already in the dictionary we increment that ```dict``` entry (```d[char]```) by 1.

Let's see how that would look coded up and see it in action.

In [2]:
# define a counting function
def histogram(s):
    d = dict()
    for chr in s:
        if chr not in d:
            d[chr] = 1
        else:
            d[chr] += 1
    return d

f_loc = '../data/pparg_prot.fa' # open a file
f_hand = open(f_loc, 'r')
lines  = f_hand.readlines() # get the lines
a_acids = '' # empty str
for i in range(1,len(lines)): # don't include the first line (fasta header)
    a_acids = a_acids + lines[i].strip()
aa_counts = histogram(a_acids) # use the function; returns a dict

for k in aa_counts.keys(): # loop through dict to get values
    print '%s: %d' % (k, aa_counts[k])

A: 26
C: 10
E: 32
D: 32
G: 19
F: 26
I: 33
H: 14
K: 39
M: 16
L: 50
N: 15
Q: 23
P: 23
S: 34
R: 19
T: 23
W: 1
V: 24
Y: 18


The name of the function is ```histogram```, which is a statistical term for a set of counters (or frequencies). One little bit of syntax you haven't seen yet is the use of ```+=```. This simply means increment the current value by whatever follows the ```+=``` sign.

```python
n = 1
n += 1 # add 1 to n
```

We have used the function to parse through a fasta file containing the amino acids making up the human pparg protein. The function returns a dictionary which we then loop over the extract the counts.

##Putting it together 1

Write a function that will compute the square root of the sum of two numbers. 

Hint: The square root of a number is the same as the number raised to 0.5. Use the values 20 and 5 as input and you should get the value 5 back out.

In [13]:
## solution remove from student doc
def sqrt_sum(a,b):
    return (a+b)**0.5

sqrt_sum(20,5)

5.0

##Putting it together 2

Write a function that can find the longest side of a right angled triangle. Test it with the values 3 and 4 for the other two sides. You should get 5 back out again. For those for whom it's been a while - Pythagoras theorem: $a^2 = b^2 + c^2$.

In [12]:
## solution remove from student doc
def hypot(a,b):
    hyp = (a**2 + b**2)**0.5
    
    return hyp

hypot(3,4)

5.0

##Homework

Write a script that first defines a function to compute an average. Then ask the user to specify a *tab delimited* file and read that file in. Use the ```try/except``` construct to gracefully exit if the file doesn't exist. Once the file is acquired print the column headers and ask the user which one they would like the average of. Use a function to compute the average of that column (python already uses mean as a function name so don't use that name for your function). If the selected column is not numeric use ```try\except``` to gracefully exit. Use the 'marathon.csv' file in the data directory for this exercise and calculate the average of the Time column.

In [1]:
# solution - remove from student doc
import csv

def average(lst):
    l = len(lst)
    s = sum(lst)
    mn = s/l
    return mn


f_loc = '../data/marathon.csv'

try:
    f_hand = open(f_loc, 'r')
except:
    print 'That file does not exist.'

header = f_hand.readline().strip().split(',')
col_string = ', '.join(header)

print 'The column names are %s .' % col_string
mn_col = raw_input('Which column would you like to calculate the average of?')

try:

    if mn_col in col_string:
        ind = header.index(mn_col) # index for column we want
        
except:
    print 'Cannot find the required column.'
    
mn_data = []
 
for line in f_hand:
    try:
        data_point = float(line.strip().split(',')[ind])
    except:
        'Cannot convert data points to floats.'
        
    mn_data.append(data_point)
    ave = average(mn_data)
    
print 'The average %s is %.2f.' % (mn_col, ave)


The column names are Year, Gender, Time .
Which column would you like to calculate the average of?Time
The average Time is 2.39.


##The ```pandas``` module

In this second part of the unit we'll introduce the ```pandas``` module and see how it can make some of the data analysis tasks we carried out in the last unit easier. From the [website](http://pandas.pydata.org/):
>pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

So what does that mean? Well in a nutshell ```pandas``` is designed to make it easier to do the kind of data analysis we did in the last unit where we divided a data set up into male and female and created derived variables, BMI in that case. The ```pandas``` module also makes it easier  - well easier than it was previously - to do statistical analysis on data.

To begin we'll import the ```pandas``` module to make the functions in the module available to our current python session.

In [1]:
import pandas as pd

We have made python aware that we want to use the ```pandas``` module and we have also told python that we want to refer to this module as ```pd``` rather than typing out ```pandas``` all the time. You'll see this idiom used quite frequently in python code. Many times the module name will be shortened to some conventional, de facto form. Some examples are:

```import numpy as np``` - this is python module for numerical computing i.e. routines for calculus & linear algebra

```import scipy.stats as stats``` - scipy is another scientific computing module that contains a range of statistical routines

One thing to note about the second example above is that we have only imported part of the ```scipy``` module and specifically the part that has statistical routines. 

The ```pandas``` library uses two workhorse data structures to make it easier to work with data; these are the ```Series``` and the ```DataFrame```. The ```Series``` data object is one dimensional - like a single column or a single row from a spreadsheet. We can create and examine a ```Series``` object as shown below.

### The ```Series```

In [2]:
ser = pd.Series([1,19,-10,12.4,56.2])
ser

0     1.0
1    19.0
2   -10.0
3    12.4
4    56.2
dtype: float64

The constructor for the ```Series``` object takes its' argument and tranforms it into a ```pandas``` series. We use the  ```pd.Series``` notation to tell python we want to use the ```Series``` function in the ```pandas``` library. When we examine the object we see that it contains the datatype (```dtype```) ```float64```. Don't worry about the 64 part (you may see 32), the useful part is knowing this ```Series``` contains ```float```s. The numbers to the left of our object are the **index** for each entry of the ```Series```. We use these to access individual data values and they can be changed or specified when we create our object.

In [3]:
ser = pd.Series([1,19,-10,12.4,56.2], index=['a', 'b', 'c', 'd','e'])
ser

a     1.0
b    19.0
c   -10.0
d    12.4
e    56.2
dtype: float64

In [4]:
ser.index

Index([u'a', u'b', u'c', u'd', u'e'], dtype='object')

You can also create a ```Series``` from a dictionary and the keys in the dictionary will be automatically assigned as the ```Series``` index. Super handy!

In [4]:
dct = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
ser2 = pd.Series(dct)
ser2.index

Index([u'a', u'b', u'c', u'd', u'e'], dtype='object')

The ```u``` before each index entry in the above output stand for [**unicode**](http://en.wikipedia.org/wiki/Unicode) and this is one way of representing text characters. It makes no great difference for our purposes. Instead of having to create a slice using a particular values position in the ```Series``` object (as would have to do with a ```list```) we can use the index directly.

In [5]:
ser['c']

-10.0

If we wanted to select several values from the ```Series``` we can use the ```ix``` (short for index) method. Notice that because of the way ```pandas``` works internally we have to pass a list of lists to the ```.ix``` method.

In [6]:
ser.ix[['a','d']]

a     1.0
d    12.4
dtype: float64

Filtering is also easy because we don't have to write loops. Notice that we get a ```Series``` object back.

In [7]:
ser[ser > 10]

b    19.0
d    12.4
e    56.2
dtype: float64

Arithmetical and mathematical transformations can also be done in one pass.

In [8]:
ser * 3

a      3.0
b     57.0
c    -30.0
d     37.2
e    168.6
dtype: float64

There's a nice tutorial on the pandas datatypes [here](http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/) and you can read more about the kinds of operations you can carry out with ```Series``` objects. We'll move on to the second ```pandas``` data structure now - the ```DataFrame```.

### The ```Dataframe```

The ```Dataframe``` object is like a single sheet from a spreadsheet. It has two dimensions - rows and columns. The most useful feature of ```DataFrames``` is that each column can contain a different type of data e.g. one column could contain text (i.e. words), another can contain ```ints```, another could contain ```booleans``` and yet another ```floats```. This approach makes handling data much easier than maintaining separate lists or lists of lists. To construct a ```DataFrame``` by hand we can use a ```dict``` object where each key in the ```dict``` indexes a ```list```.

In [2]:
cities = {'city':['Stirling', 'Cork', 'Leeds', 'Swansea'], 'country':['Scotland','Ireland', 'England', 'Wales'], \
          'pop':[45750, 119230, 751500, 239000,]}
cities

{'city': ['Stirling', 'Cork', 'Leeds', 'Swansea'],
 'country': ['Scotland', 'Ireland', 'England', 'Wales'],
 'pop': [45750, 119230, 751500, 239000]}

At the moment ```cities``` is a ```dict``` object with each list indexed by a ```string```. To convert this to a ```DataFrame``` we simply use the ```pandas``` ```DataFrame``` function.

In [10]:
citiesDF = pd.DataFrame(cities)
citiesDF

Unnamed: 0,city,country,pop
0,Stirling,Scotland,45750
1,Cork,Ireland,119230
2,Leeds,England,751500
3,Swansea,Wales,239000


We can get a single column back by indexing on the column header.

In [11]:
citiesDF['city']
# citiesDF.city # alternative dot notation

0    Stirling
1        Cork
2       Leeds
3     Swansea
Name: city, dtype: object

We can select rows of our data using the ```ix``` method.

In [12]:
citiesDF.ix[0:1] # only first two entries

Unnamed: 0,city,country,pop
0,Stirling,Scotland,45750
1,Cork,Ireland,119230


In [13]:
citiesDF.ix[[1,3]] # non-contiguous selection

Unnamed: 0,city,country,pop
1,Cork,Ireland,119230
3,Swansea,Wales,239000


Selecting data based on user defined filters is very easy.

In [14]:
citiesDF[citiesDF['pop'] < 100000] # cities with pop < 100,000

Unnamed: 0,city,country,pop
0,Stirling,Scotland,45750


Again we've only covered the very, very basics here (more to make you aware of ```pandas``` than anything else) and would suggest that you visit either the ```pandas``` [documentation](http://pandas.pydata.org/pandas-docs/stable/) (it's excellent), Grag Rada's useful [tutorials](http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/) or if you're really keen, indulge yourself with the ```pandas``` [book](http://www.amazon.co.uk/Python-Data-Analysis-Wrangling-IPython/dp/1449319793).

##Putting it together 1
Using the table below create a python dictionary object with the column headers as keys and the contents as values. You've already seen this technique above. Use this dictionary to create a ```pandas DataFrame``` variable called ```old_samp```.

| Gender | Age | body wt | ht    |
|--------|-----|---------|-------|
| F      | 77  | 63.8    | 155.5 |
| F      | 80  | 56.4    | 160.5 |
| M      | 79  | 75.5    | 171   |
| M      | 75  | 83.9    | 178.5 |

Use the ```pandas describe()``` method on this ```DataFrame```. What is the output from ```describe()``` telling you? Which column from the original data is missing from this output and why? Use the techniques for filtering data described above to get only the male data, only the female data. 

In [57]:
# solution - remove from student doc
import pandas as pd
dct = {'sex':['F', 'F', 'M', 'M'], 'age':[77,80,79,75], 'wgt':[63.8,56.4,75.5,83.9], 'ht':[155.5,160.5,171,178.5]}
old_samp = pd.DataFrame(dct)
old_samp[old_samp['sex']=='M']
old_samp[old_samp['sex']=='F']
old_samp.describe()
# output summarises all the numerical data, that's why sex is missing - it's categorical data and has no
# meaningful mean, etc

Unnamed: 0,age,ht,wgt
count,4.0,4.0,4.0
mean,77.75,166.375,69.9
std,2.217356,10.347101,12.204098
min,75.0,155.5,56.4
25%,76.5,159.25,61.95
50%,78.0,165.75,69.65
75%,79.25,172.875,77.6
max,80.0,178.5,83.9


##Reading Data with ```pandas```

One of most useful things about ```pandas``` is that is makes opening, examining and analysing data files like csv files much easier. The data we'll use are a set of marathon winning times for the years 1970 to 2000 for men and women. The original dataset is available (along with many others) at the [OpenIntro](https://www.openintro.org/) website. These data are in a ```csv``` file and we'll read that into a ```pandas``` DataFrame object using the [```read_csv```](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html) function. Then we'll examine the first 5 rows of the data to make sure it looks ok using the ```head()``` method on our DataFrame.

In [6]:
data_in = pd.read_csv('../data/marathon.csv', sep=',')
data_in.head()

Unnamed: 0,Year,Gender,Time
0,1980,m,2.16139
1,1981,m,2.13694
2,1982,m,2.15806
3,1983,m,2.14972
4,1984,m,2.24806


Hopefully you can see how much more convenient this is compared to generating a file location variable and then a file handle. We still need a location but we just point ```pandas``` at the file and give it information on how the columns are divided.

We can get information about the DataFrame *structure* using the ```.info()``` method on our DataFrame. This tells us we have 59 rows and 3 variables (see below). ```Year``` is an integer, ```Gender``` is an object (here a string) and ```Time``` is an integer. Note that ```Gender``` is reported as an ```object``` rather than a ```string``` because of the way ```pandas``` works internally (it relies on another module called ```numpy```).

In [17]:
data_in.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 59 entries, 0 to 58
Data columns (total 3 columns):
Year      59 non-null int64
Gender    59 non-null object
Time      59 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 1.8+ KB


We can also get the column headers alone.

In [18]:
data_in.columns

Index([u'Year', u'Gender', u'Time'], dtype='object')

The data types returned by ```pandas``` are based on those of the ```numpy``` module. We can look at the datatypes more explicitly by looking at the type of the first entry in each column (```ix.[0]``` below).

In [19]:
for i in data_in.columns:
    print "Column %s is %s." % (i, type(data_in[i].ix[0])) # type of the first entry (i.e. header) in every column

Column Year is <type 'numpy.int64'>.
Column Gender is <type 'str'>.
Column Time is <type 'numpy.float64'>.


Here we have created a loop that has gone over the *columns* of the data and pulled out the ```type``` of the first entry in each column (i.e. the ```type(data_in[i].ix[0])``` bit).

## Basic statistics

Once we have some data we often want to carry out some sort of statistical analysis. Are groups different? Are variables related? The ```pandas``` library has functionality built in for several basic statistical summaries making for a very quick way of examining data.

We can easily pull out some descriptive statistics using the relevant methods.

In [20]:
mean_time = data_in['Time'].mean()
# data_in.Time.mean() alternative syntax
median_time = data_in['Time'].median()
print "The mean marathon winning time is %.2f hours and the median is %.2f hours." % (mean_time, median_time)

The mean marathon winning time is 2.39 hours and the median is 2.42 hours.


So now we know the mean and median winning times but what about the time for each sex? Are these different? We can subset the data by gender and then calculate the mean time.

In [21]:
male_mean_time = data_in[data_in['Gender'] == 'm']['Time'].mean()
female_mean_time = data_in[data_in['Gender']== 'f']['Time'].mean()

# male_mean_time = data_in[data_in.Gender == 'm'].Time.mean() alternative notation
# female_mean_time = data_in[data_in.Gender == 'f'].Time.mean() alternative notation

print "The average winning marathon time for men is %.2f hours and the average winning time for women is %.2f hours." % (male_mean_time, female_mean_time)

The average winning marathon time for men is 2.22 hours and the average winning time for women is 2.57 hours.


In the example above we used subsetting to pull out the male and female data and then summarise each dataset but ```pandas``` provides a ```groupby``` function that makes this kind of subsetting even easier.

In [22]:
mf_data = data_in.groupby('Gender')['Time'].mean()
mf_data

Gender
f    2.566216
m    2.220648
Name: Time, dtype: float64

In [50]:
print "The average winning marathon time for men is %.2f hours and the average winning time for women \\
is %.2f hours." % (mf_data.ix['m'], mf_data.ix['f'])
print "The average difference for males and females is %.2f minutes." % ((mf_data.ix['f'] - mf_data.ix['m'])*60)

The average winning marathon time for men is 2.22 hours and the average winning time for women \is 2.57 hours.
The average difference for males and females is 20.73 minutes.


In the example above we generated gender specific mean times by:

* Grouping the data by ```Gender```
* Selecting the ```Time``` column of the grouped data
* Passing the grouped ```Time``` to the ```mean``` function

So we can see that male winning times tend to be ~0.35 hours (or 20 minutes) faster than female winning times over the 30 year span of the data. In the next unit we'll see how to visualise data like this.

## Putting it together 1

Using ```pandas``` read in the elderly csv file from the data directory and calculate an overall mean and sex specific means for height and weight.

In [21]:
# solution - remove from student doc

import pandas as pd
data_in = pd.read_csv('../data/elderlyHeightWeight.csv', sep='\t')
data_in.head()

mn_height = data_in['ht'].mean()
mn_weight = data_in['body wt'].mean()
print 'The mean height is %.2fcm; the mean weight is %.2fkg' % (mn_height, mn_weight)

The mean height is 165.75cm; the mean weight is 69.12kg


In [20]:
grouped_ht = data_in.groupby('Gender')['ht'].mean()
grouped_wt = data_in.groupby('Gender')['body wt'].mean()

print 'The mean height for females is %.2fcm; the mean height for males is %.2fcm' % (grouped_ht[0], grouped_ht[1])
print 'The mean weight for females is %.2fkg; the mean weight for males is %.2fkg' % (grouped_wt[0], grouped_wt[1])

The mean height for females is 158.25cm; the mean height for males is 171.75cm
The mean weight for females is 59.39kg; the mean weight for males is 76.91kg


## Homework

The run10.txt file in the data directory contains *tab separated* results from the Cherry Blossom 10 mile run in Washington DC from 2012 (see [OpenIntro book](https://www.openintro.org/download.php?file=os3&referrer=/stat/textbook.php), section 4.1.3). Read this data into a ```pandas``` DataFrame using the read_csv function. How many data points (rows) are there? 

The data include:

* Finishing Position
* Time
* Age
* Gender
* State of Origin

What was the average time for the race? What was the median time for each sex? What was the time for the fastest and slowest runners? Which state was represented by the most runners?

Hint: For the last question you might consider extracting the state data and using the [```value_counts```](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html) and [```idxmax```](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.idxmax.html?highlight=idxmax#pandas.Series.idxmax) functions.

In [22]:
## Solution - remove from student docs
run_data = pd.read_csv('../data/run10.txt', sep='\t')
#print run_data.head(n=5)
print len(run_data)
print run_data['time'].mean() # 94.52
print run_data.groupby('gender')['time'].median()
print run_data['time'][run_data['time']==min(run_data['time'])] # fastest = 45 mins
print run_data['time'][run_data['time']==max(run_data['time'])] # slowest = 171 mins
print run_data['state'].value_counts().idxmax()

16924
94.5191922713
gender
F    98.03
M    87.47
Name: time, dtype: float64
7749    45.25
Name: time, dtype: float64
997    170.97
Name: time, dtype: float64
VA


## Homework 2

The file ```states.csv``` in the ```data``` folder contains a comma separated file with the name of each US state and the state two letter abbreviation. In the data for the Cherry Blossom run there are runners who do not come from one of these states. Read the ```states.csv``` file into a DataFrame and use pandas to extract the non-state runners. How many come from each region that is NOT a US state?

Hint: You might want to consider using the pandas [```isin```](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isin.html) function to create a boolean mask for the run data and then the [```value_counts```](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html) function to tabulate your results.
 

In [23]:
# solution - remove from student doc
states = pd.read_csv('../data/states.csv')
# which values in states column are not a state
non_states = run_data['state'][run_data['state'].isin(states['abbr'])==False]
print non_states.value_counts()

DC          3986
Canada         8
Kenya          4
AP             3
Norway         3
Ethiopia       3
AE             2
Poland         1
Ukraine        1
PR             1
Bolivia        1
dtype: int64
