# Series methods

In [3]:
import pandas as pd

# Read the data. The squeeze parameter coerces a one-column DataFrame into a Series 
pokemon = pd.read_csv("../data/pokemon.csv", index_col = "Pokemon", squeeze = True)
google = pd.read_csv("../data/google_stocks.csv",parse_dates = ["Date"],index_col = "Date",squeeze = True)

In [4]:
pokemon

Pokemon
Bulbasaur      Grass / Poison
Ivysaur        Grass / Poison
Venusaur       Grass / Poison
Charmander               Fire
Charmeleon               Fire
                    ...      
Stakataka        Rock / Steel
Blacephalon      Fire / Ghost
Zeraora              Electric
Meltan                  Steel
Melmetal                Steel
Name: Type, Length: 809, dtype: object

In [8]:
google.sort_values()

Date
2004-09-03      49.82
2004-09-01      49.94
2004-08-19      49.98
2004-09-02      50.57
2004-09-07      50.60
               ...   
2019-04-23    1264.55
2019-10-25    1265.13
2018-07-26    1268.33
2019-04-26    1272.18
2019-04-29    1287.58
Name: Close, Length: 3824, dtype: float64

## Overwriting a Series with the inplace parameter

In [9]:
pokemon.sort_index(ascending = True)

Pokemon
Abomasnow        Grass / Ice
Abra                 Psychic
Absol                   Dark
Accelgor                 Bug
Aegislash      Steel / Ghost
                  ...       
Zoroark                 Dark
Zorua                   Dark
Zubat        Poison / Flying
Zweilous       Dark / Dragon
Zygarde      Dragon / Ground
Name: Type, Length: 809, dtype: object

In [10]:
pokemon

Pokemon
Bulbasaur      Grass / Poison
Ivysaur        Grass / Poison
Venusaur       Grass / Poison
Charmander               Fire
Charmeleon               Fire
                    ...      
Stakataka        Rock / Steel
Blacephalon      Fire / Ghost
Zeraora              Electric
Meltan                  Steel
Melmetal                Steel
Name: Type, Length: 809, dtype: object

What if we wanted to modify the  <code>Series</code>? Many methods in pandas include an <code>inplace</code> parameter that, when passed an argument of <code>True</code>, appears to modify the object on which the method is invoked.

In [11]:
pokemon.sort_index(ascending = True, inplace = True)

In [12]:
pokemon

Pokemon
Abomasnow        Grass / Ice
Abra                 Psychic
Absol                   Dark
Accelgor                 Bug
Aegislash      Steel / Ghost
                  ...       
Zoroark                 Dark
Zorua                   Dark
Zubat        Poison / Flying
Zweilous       Dark / Dragon
Zygarde      Dragon / Ground
Name: Type, Length: 809, dtype: object

## Counting values with the value_counts method 

In [13]:
pokemon.value_counts()

Normal                65
Water                 61
Grass                 38
Psychic               35
Fire                  30
                      ..
Psychic / Grass        1
Psychic / Fighting     1
Rock / Poison          1
Bug / Ground           1
Dragon / Electric      1
Name: Type, Length: 159, dtype: int64

We may be more interested in the ratio of a Pokémon type relative to all the types. 
Set the <code>value_counts</code> method’s <code>normalize</code> parameter to <code>True</code> to return the frequencies of each unique value. A value’s frequency is the portion of the data set that the
value makes up:

In [14]:
pokemon.value_counts(normalize = True).head()

Normal     0.080346
Water      0.075402
Grass      0.046972
Psychic    0.043263
Fire       0.037083
Name: Type, dtype: float64

We can multiply the values in the frequency <code>Series</code> by 100 to get the percentage each Pokémon type contributes to the whole

In [15]:
pokemon.value_counts(normalize = True).head() * 100

Normal     8.034611
Water      7.540173
Grass      4.697157
Psychic    4.326329
Fire       3.708282
Name: Type, dtype: float64

We can define intervals as values in a list and pass the list to the value_<code>counts</code> method’s <code>bins</code> parameter. Pandas will use every two subsequent list values as the lower and upper ends of an interval:

In [16]:
google.describe()

count    3824.000000
mean      479.945860
std       328.528592
min        49.820000
25%       235.860000
50%       314.680000
75%       708.205000
max      1287.580000
Name: Close, dtype: float64

In [20]:
buckets = [0, 200, 400, 600, 800, 1000, 1200, 1400]
google.value_counts(bins = buckets, sort = False)

(-0.001, 200.0]      595
(200.0, 400.0]      1568
(400.0, 600.0]       575
(600.0, 800.0]       380
(800.0, 1000.0]      207
(1000.0, 1200.0]     406
(1200.0, 1400.0]      93
Name: Close, dtype: int64

Notice that the first interval includes the value -0.001 instead of 0. When pandas orga-
nizes the <code>Series</code>’ values into buckets, it may extend any bin’s range up to .1% in
either direction. The symbols around intervals have significance:

* A parenthesis marks a value as *excluded* from the interval.
* A square bracket marks a value as *included* in the interval.

The value_counts method’s bins parameter also accepts an integer argument.

Pandas will automatically calculate the difference between the maximum and minimum values in the Series and divide the range into the specified number of bins.

The next example splits the stock prices in google into six bins.

In [21]:
google.value_counts(bins = 6, sort = False)

(48.581, 256.113]      1204
(256.113, 462.407]     1104
(462.407, 668.7]        507
(668.7, 874.993]        380
(874.993, 1081.287]     292
(1081.287, 1287.58]     337
Name: Close, dtype: int64

## Invoking a function on every Series value with the apply method

A function is a *first-class object* in Python, which means that the language treats it like any other data type.

Here’s the simplest way to think about first-class objects. Anything that you can do with a number, you can do with a function. You can do all the following things, for example:

* Store a function in a list.
* Assign a function as a value for a dictionary key.
* Pass a function into another function as an argument.
* Return a function from another function.

It’s important to distinguish between a function and a function invocation. 
A *function* is a sequence of instructions that produces an output; it is a “recipe” that has not been cooked yet. By comparison, a function invocation is the actual execution of the instructions; it is the cooking of the recipe.

The next example declares a <code>funcs</code> list that stores three Python built-in functions.
The <code>len</code> , <code>max</code> , and <code>min</code> functions are not invoked within the list. The list stores references to the functions themselves:

In [26]:
funcs = [len,max,min]

The next example iterates over the <code>funcs</code> list with a <code>for</code> loop. Over three iterations, the <code>current_func</code> iterator variable represents the uninvoked <code>len</code> , <code>max</code> , and <code>min</code> functions. 

During each iteration, the <code>loop</code> invokes the dynamic <code>current_func</code> function, passes in the <code>google Series</code> , and prints the return value:

In [27]:
func_names = ["len","max","min"]

for name_func,current_func in zip(func_names,funcs):
    print("{}--->{}".format(name_func,current_func(google)))

len--->3824
max--->1287.58
min--->49.82


**The key takeaway here is that we can treat a function like any other object in
Python**.

So how does this fact apply to pandas?

The <code>Series</code> has a method called <code>apply</code> that invokes a function once for each <code>Series</code> value and returns a new <code>Series</code> consisting of the return values of the function invocations.

The <code>apply</code> method expects the function it will invoke as its first parameter, <code>func</code> . 

The next example passes Python’s built-in <code>round</code> function:

In [28]:
google

Date
2004-08-19      49.98
2004-08-20      53.95
2004-08-23      54.50
2004-08-24      52.24
2004-08-25      52.80
               ...   
2019-10-21    1246.15
2019-10-22    1242.80
2019-10-23    1259.13
2019-10-24    1260.99
2019-10-25    1265.13
Name: Close, Length: 3824, dtype: float64

In [29]:
google.apply(round)

Date
2004-08-19      50
2004-08-20      54
2004-08-23      54
2004-08-24      52
2004-08-25      53
              ... 
2019-10-21    1246
2019-10-22    1243
2019-10-23    1259
2019-10-24    1261
2019-10-25    1265
Name: Close, Length: 3824, dtype: int64

The <code>apply</code> method also accepts custom functions. Define the function to accept a single parameter and have it return the value that you’d like pandas to store in the aggregated <code>Series</code>.

Let’s say we wanted to find out how many of our Pokémon have one type (such as
Fire) and how many have two or more types.

In [30]:
def single_or_multi(pokemon_type):
    if "/" in pokemon_type:
        return "Multi"
    return "Single"

The next example calls the <code>apply</code> method with the <code>single_or_multi</code> function as its argument. Pandas invokes the <code>single_or_multi</code> function for every <code>Series</code> value:

In [31]:
pokemon

Pokemon
Abomasnow        Grass / Ice
Abra                 Psychic
Absol                   Dark
Accelgor                 Bug
Aegislash      Steel / Ghost
                  ...       
Zoroark                 Dark
Zorua                   Dark
Zubat        Poison / Flying
Zweilous       Dark / Dragon
Zygarde      Dragon / Ground
Name: Type, Length: 809, dtype: object

In [32]:
pokemon.apply(single_or_multi)

Pokemon
Abomasnow     Multi
Abra         Single
Absol        Single
Accelgor     Single
Aegislash     Multi
              ...  
Zoroark      Single
Zorua        Single
Zubat         Multi
Zweilous      Multi
Zygarde       Multi
Name: Type, Length: 809, dtype: object

Let’s find out how many Pokémon fall into each classification by invoking <code>value_counts</code>:

In [33]:
pokemon.apply(single_or_multi).value_counts()

Multi     405
Single    404
Name: Type, dtype: int64