# Series


In [1]:
import pandas as pd 

## Create a Series Object from a List
* A pandas **Series** is a one-dimensional labelled array.
* A Series combines the best features of a list and a dictionary.
* A Series maintains a singles collection of ordered values (i.e. a single column of data).
* We can assign each value an identifier, which does not have to *be* unique.

In [2]:
ice_cream = ['Chocolate', 'Vanilla', 'Strawberry', 'Rum Raisin']
pd.Series(ice_cream)

0     Chocolate
1       Vanilla
2    Strawberry
3    Rum Raisin
dtype: object

In [3]:
lottery_numbers = [4, 8, 15, 16, 23, 42]
pd.Series(lottery_numbers)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64

In [4]:
registrations = [True, False, False, False, True]
pd.Series(registrations)

0     True
1    False
2    False
3    False
4     True
dtype: bool

## Create a Series Object from a Dictionary

In [5]:
sushi = {
  "Salmon": "Orange",
  "Tuna":  "Red",
  "Eel": "Brown"
}

pd.Series(sushi)

Salmon    Orange
Tuna         Red
Eel        Brown
dtype: object

In [6]:
sushi = {
  "Salmon": {
    "Color": "Orange"
  },
  "Tuna":  {
    "Color": "Red"
  },
  "Eel": {
    "Color": "Brown"
  },
}

pd.Series(sushi)

Salmon    {'Color': 'Orange'}
Tuna         {'Color': 'Red'}
Eel        {'Color': 'Brown'}
dtype: object

## Intro to Series Methods

In [7]:
prices = pd.Series([2.99, 4.43, 1.34])
prices

0    2.99
1    4.43
2    1.34
dtype: float64

In [8]:
prices.sum()

8.76

In [9]:
prices.product()

17.749238000000002

In [10]:
prices.mean()

2.92

In [11]:
prices.std()

1.5461888629789053

## Intro to Attributes
* An **attribute** is a piece of data that lives on an object.
* An **attribute** is a fact, a detail, a characteristic of the object.
* Access an attribute with `object.attribute` syntax.

In [12]:
adjectives = pd.Series(["Smart", "Handsome", "Charming", "Brilliant", "Humble", "Smart"])
adjectives


0        Smart
1     Handsome
2     Charming
3    Brilliant
4       Humble
5        Smart
dtype: object

In [13]:
adjectives.size

6

In [14]:
adjectives.is_unique

False

In [15]:
adjectives.values

array(['Smart', 'Handsome', 'Charming', 'Brilliant', 'Humble', 'Smart'],
      dtype=object)

In [16]:
adjectives.index

RangeIndex(start=0, stop=6, step=1)

In [17]:
type(adjectives.values)

numpy.ndarray

In [18]:
type(adjectives.index)

pandas.core.indexes.range.RangeIndex

## Parameters and Arguments
* A **parameter** is the name for an expected input to a function/method/class instatiation.
* An **argument** is the concrete value we provide for a parameter during invocation.

In [19]:
fruits = ["Apple", "Orange", "Plum", "Grape", "Blueberry", "Watermelon"]
weekdays = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday"]

In [20]:
pd.Series(fruits)

0         Apple
1        Orange
2          Plum
3         Grape
4     Blueberry
5    Watermelon
dtype: object

In [21]:
pd.Series(fruits, weekdays)

Monday            Apple
Tuesday          Orange
Wednesday          Plum
Thursday          Grape
Friday        Blueberry
Monday       Watermelon
dtype: object

In [22]:
pd.Series(weekdays, fruits)

Apple            Monday
Orange          Tuesday
Plum          Wednesday
Grape          Thursday
Blueberry        Friday
Watermelon       Monday
dtype: object

## Import Series with the pd.read_csv Function

* A **CSV** is a plain text file that uses lines breaks to separate  rows and commas to separate row values.
* Pandas ships with many different `read_` functions for different types of files.
* The `read_csv` function accepts many different parameters. The first one specifies the file name/path.
* The `read_csv` function will import the dataset as a **DataFrame**, a 2D table.
* The `usecols` parameter accepts a list of the column(s) to import.
* The `squeeze` method converts a **DataFrame** to a **Series**.

In [23]:
pokemon = pd.read_csv("./pandas/Complete/pokemon.csv", usecols=["Name"]).squeeze("columns")
pokemon

0          Bulbasaur
1            Ivysaur
2           Venusaur
3         Charmander
4         Charmeleon
            ...     
1005    Iron Valiant
1006        Koraidon
1007        Miraidon
1008    Walking Wake
1009     Iron Leaves
Name: Name, Length: 1010, dtype: object

In [24]:
google = pd.read_csv("./pandas/Complete/google_stock_price.csv", usecols=["Price"]).squeeze("columns")
google

0         2.490664
1         2.515820
2         2.758411
3         2.770615
4         2.614201
           ...    
4788    132.080002
4789    132.998001
4790    135.570007
4791    137.050003
4792    138.429993
Name: Price, Length: 4793, dtype: float64

## The head and tail Methods
* The `head` method returns a number of rows from the top/beginning of the `Series`.
* The `tail` method returns a number of rows from the bottom/end of the `Series`.


In [25]:
pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [26]:
google.tail()

4788    132.080002
4789    132.998001
4790    135.570007
4791    137.050003
4792    138.429993
Name: Price, dtype: float64

## Passing Series to Python's Built-In Functions

* The `len` function returns the length of the **Series**.
* The `type` function returns the type of an object.
* The `list` function  converts the **Series** to a list.
* The `dict` function  converts the **Series** to a dictionary.
* The `sorted` function  converts the **Series** to a sorted list.
* The `max` function returns the largest value in the **Series**.
* The `min` function returns the smallest value in the **Series**.
* The `len` function returns the length of the **Series**.

In [27]:
pokemon = pd.read_csv("./pandas/Complete/pokemon.csv", usecols=["Name"]).squeeze("columns")
google = pd.read_csv("./pandas/Complete/google_stock_price.csv", usecols=['Price']).squeeze("columns")
pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [28]:
len(pokemon)
type(pokemon)
list(pokemon)
sorted(pokemon)
type(sorted(pokemon))
dict(pokemon)

max(google)
min(google)

max(pokemon)
min(pokemon)

'Abomasnow'

## Check for Inclusion with Python's in Keyword

* The `in` keyword checks if a value exists within an object.
* The `in` keyword will look for a value in the **Series's** index.
* Use the `index` and `values` attributes to access "nested" objects within the **Series**.
* Combine the `in` keyword with `values` to search withing the **Series's** values.

In [29]:
"car" in "racecar"

True

In [30]:
2 in [3, 2, 1]

True

In [31]:
"Bulbasaur" in pokemon

False

In [32]:
0 in pokemon

True

In [33]:
"Bulbasaur" in pokemon.values

True

## The sort_values method
* The `sort_values` method sorts a *Series* values in order.
* By default, pandas applies an ascending sort (smallest to largest).
* Customize the sort order with the `ascending parameter`.

In [34]:
google.head()

0    2.490664
1    2.515820
2    2.758411
3    2.770615
4    2.614201
Name: Price, dtype: float64

In [35]:
google.sort_values(ascending=False)

4395    151.863495
4345    151.000000
4346    150.141754
4336    150.000000
4341    150.000000
           ...    
12        2.515820
11        2.514326
13        2.509095
0         2.490664
10        2.470490
Name: Price, Length: 4793, dtype: float64

In [36]:
pokemon.sort_values(ascending=False)

717      Zygarde
633     Zweilous
40         Zubat
569        Zorua
570      Zoroark
         ...    
680    Aegislash
616     Accelgor
358        Absol
62          Abra
459    Abomasnow
Name: Name, Length: 1010, dtype: object

# The sort_index method
* The `sort_index` method sorts **Series** by its index.
* The `sort_index` method also accepts an `ascending` parameter to set sort order.

In [37]:
pokemon = pd.read_csv("./pandas/Complete/pokemon.csv", index_col="Name").squeeze("columns")
pokemon

Name
Bulbasaur          Grass, Poison
Ivysaur            Grass, Poison
Venusaur           Grass, Poison
Charmander                  Fire
Charmeleon                  Fire
                      ...       
Iron Valiant     Fairy, Fighting
Koraidon        Fighting, Dragon
Miraidon        Electric, Dragon
Walking Wake       Water, Dragon
Iron Leaves       Grass, Psychic
Name: Type, Length: 1010, dtype: object

In [38]:
pokemon.sort_index()

Name
Abomasnow        Grass, Ice
Abra                Psychic
Absol                  Dark
Accelgor                Bug
Aegislash      Steel, Ghost
                  ...      
Zoroark                Dark
Zorua                  Dark
Zubat        Poison, Flying
Zweilous       Dark, Dragon
Zygarde      Dragon, Ground
Name: Type, Length: 1010, dtype: object

# Extract Series Value by Index Position
* Use the `iloc` accessor to extract a **Series** value by its index position.
* `iloc` is short for "index location".
* Python's list slicing syntaxes (slices, slices from start, slices  to end, etc.) are supported with *Series* objects.

In [39]:
pokemon.iloc[0] # iloc is an attribute

'Grass, Poison'

In [40]:
pokemon.iloc[[100, 200, 300]]

Name
Electrode    Electric
Unown         Psychic
Delcatty       Normal
Name: Type, dtype: object

In [41]:
pokemon.iloc[20:54]

Name
Spearow       Normal, Flying
Fearow        Normal, Flying
Ekans                 Poison
Arbok                 Poison
Pikachu             Electric
Raichu              Electric
Sandshrew             Ground
Sandslash             Ground
Nidoran♀              Poison
Nidorina              Poison
Nidoqueen     Poison, Ground
Nidoran♂              Poison
Nidorino              Poison
Nidoking      Poison, Ground
Clefairy               Fairy
Clefable               Fairy
Vulpix                  Fire
Ninetales               Fire
Jigglypuff     Normal, Fairy
Wigglytuff     Normal, Fairy
Zubat         Poison, Flying
Golbat        Poison, Flying
Oddish         Grass, Poison
Gloom          Grass, Poison
Vileplume      Grass, Poison
Paras             Bug, Grass
Parasect          Bug, Grass
Venonat          Bug, Poison
Venomoth         Bug, Poison
Diglett               Ground
Dugtrio               Ground
Meowth                Normal
Persian               Normal
Psyduck                Water
Name: Typ

# Extract Series Value by Index Label
* Use the `loc` accessor to extract a `Series` value by its index label.
* Pass a list to extract multiple values by index label.
* If one index label/position in the list does not exist, Pandas will raise an error.

In [42]:
pokemon = pd.read_csv("./pandas/Complete/pokemon.csv", index_col="Name").squeeze("columns")
pokemon.head()

Name
Bulbasaur     Grass, Poison
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [43]:
pokemon.loc["Bulbasaur"]

'Grass, Poison'

# The get Method on a Series
* The `get` method extracts a **Series*** value by index label. It is an alternative option to square brackets.
* The `get` method's second argument sets the fallback value to return if the label/position does not exist.

In [44]:
pokemon.head()

Name
Bulbasaur     Grass, Poison
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [45]:
pokemon.get("Moltres")

'Fire, Flying'

In [46]:
pokemon.get("Digimon", default="NAP (Not a Pokemon)")

'NAP (Not a Pokemon)'

# Overwrite a Series Value
* Use the `loc/iloc` accessor to target an index label/position, then use an equal sign to provide a new value.

In [47]:
pokemon = pd.read_csv("./pandas/Complete/pokemon.csv", usecols=["Name"]).squeeze("columns")
pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [48]:
pokemon.iloc[0]

'Bulbasaur'

In [49]:
pokemon.iloc[0] = "Borisaur"
pokemon.head()

0      Borisaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [50]:
pokemon.iloc[[1, 2, 4]] = ["Firemon", "Blazemon", "Flamenon"]

In [51]:
pokemon.head()

0      Borisaur
1       Firemon
2      Blazemon
3    Charmander
4      Flamenon
Name: Name, dtype: object

In [52]:
pokemon = pd.read_csv("./pandas/Complete/pokemon.csv", index_col=["Name"]).squeeze("columns")
pokemon.head()

Name
Bulbasaur     Grass, Poison
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

In [53]:
pokemon.loc["Bulbasaur"] = 'Awesomeness'

In [54]:
pokemon.head()

Name
Bulbasaur       Awesomeness
Ivysaur       Grass, Poison
Venusaur      Grass, Poison
Charmander             Fire
Charmeleon             Fire
Name: Type, dtype: object

# The copy Method
* A **copy** is a duplicate/replica of an object.
* Changes to a copy do not modify the original object.
* A **view** is a different way of looking at the same data.
* Changes to a view *do* modify the original object.
* The `copy` method creates a copy of a pandas object.

In [55]:
pokemon_df = pd.read_csv("./pandas/Complete/pokemon.csv", usecols=["Name"])
pokemon_series = pokemon_df.squeeze("columns")

In [56]:
pokemon_df

Unnamed: 0,Name
0,Bulbasaur
1,Ivysaur
2,Venusaur
3,Charmander
4,Charmeleon
...,...
1005,Iron Valiant
1006,Koraidon
1007,Miraidon
1008,Walking Wake


In [57]:
pokemon_series

0          Bulbasaur
1            Ivysaur
2           Venusaur
3         Charmander
4         Charmeleon
            ...     
1005    Iron Valiant
1006        Koraidon
1007        Miraidon
1008    Walking Wake
1009     Iron Leaves
Name: Name, Length: 1010, dtype: object

In [58]:
pokemon_series[0] = 'Whatever'
pokemon_series.head()

0      Whatever
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [59]:
pokemon_df.head()

Unnamed: 0,Name
0,Whatever
1,Ivysaur
2,Venusaur
3,Charmander
4,Charmeleon


## Math Methods on Series Objects
* The `count` method returns the number of values in the **Series**. It excludes missing values; the `size` attribute includes missing values.
* The `sum` method adds together the **Series'** values.
* The `product` method multiplies together the **Series'** values.
* The `mean` method calculates the average of the **Series'** values.
* The `std` method calculates the standard deviation of the **Series'** values.
* The `max` method returns the lagest value in the **Series**.
* The `min` method returns the smallest value in the **Series**.
* The `median` method returns the median value in the **Series**.
* The `mode` method returns the mode in the **Series**.
* The `describe` method returns summary with various mathematical calculations

In [60]:
google == pd.read_csv("./pandas/Complete/google_stock_price.csv", usecols=["Price"]).squeeze("columns")
google.head()

0    2.490664
1    2.515820
2    2.758411
3    2.770615
4    2.614201
Name: Price, dtype: float64

In [61]:
google.count()

4793

In [62]:
google.sum()

192733.129338

In [64]:
google.product() # number to large, so infinite
pd.Series([1, 2, 3 , 4]).product()

  return umr_prod(a, axis, dtype, out, keepdims, initial, where)


24

In [65]:
google.mean()

40.211376870018775

In [66]:
google.std()

37.274752943868094

In [67]:
google.max()

151.863495

In [68]:
google.min()

2.47049

In [69]:
google.median()

26.327717

In [70]:
google.mode()

0    14.719826
1    49.000000
Name: Price, dtype: float64

In [71]:
google.describe()

count    4793.000000
mean       40.211377
std        37.274753
min         2.470490
25%        12.767395
50%        26.327717
75%        56.311001
max       151.863495
Name: Price, dtype: float64