# series

    Pandas series are a type of associative arrays
    --> has some dcitionary-like properties
    --> has some sequence-like properties
    
    It is a sequence type -- so elements have a definite position in collection --> positional index.
    
    We can also define an explicit index --> a second index.
    
    0 10 a
    1 20 b
    2 30 c
    3 40 d
    
    We can reference items by positional indices [0] [1]
    or by using the explicit index ['a'] ['b'].
    
    We can have slicing as well as fancy indexing.
    
    We can slice by positional index as well as explicit indexing. However there is a slight difference between the two.
    When we use implicit indexing, it excludes endpoint.
    However, when we use explicit indexing, the endpoint is excluded.

# A point of confusion

    We can use numerical index in case of explicit index as well.
    
       0,   1,   2,   3
    [100, 200, 300, 400]
       2,   3,   4,   5
     
    
      [2] --> is this using explicit index?
    [2:3] --> or explicit index?
    
    There is a rule:
    
    If both implicit as well as explicit index are integers, then
    
      [2] --> explicit index
    [2,3] --> implicit index  
    
    
    However, to resolve this problem we have something called as iloc and loc.
    
    iloc allows us to specifically use the implicit index.
    and loc is used for explicit index.
    
    These two are not functions, they are properties and thus they use square brackets.

In [1]:
import numpy as np
import pandas as pd

In [2]:
s = pd.Series([10, 20, 30], index = list('abc'))
s

a    10
b    20
c    30
dtype: int64

In [3]:
s['a']

10

In [4]:
s[2]

  s[2]


30

    We can also add new values to the series by directly using the assignment operator.

In [5]:
s['d'] = 500

In [6]:
s

a     10
b     20
c     30
d    500
dtype: int64

    We can also make a series object by using the dictionaries. At the end of the day, series objects are just like an associative arrays.

In [7]:
capitals = {
    'USA' : 'Washington D.C.',
    'Canada' : 'Ottawa',
    'UK' : 'London',
    'France' : 'Paris'
}

In [8]:
s = pd.Series(capitals)

In [9]:
s

USA       Washington D.C.
Canada             Ottawa
UK                 London
France              Paris
dtype: object

    We can access the index and values of series using the index and values attributes.

In [10]:
s.index

Index(['USA', 'Canada', 'UK', 'France'], dtype='object')

In [11]:
s.values

array(['Washington D.C.', 'Ottawa', 'London', 'Paris'], dtype=object)

In [12]:
s.items()

<zip at 0x16dfdad78c0>

In [13]:
list(s.items())

[('USA', 'Washington D.C.'),
 ('Canada', 'Ottawa'),
 ('UK', 'London'),
 ('France', 'Paris')]

In [14]:
for country, capital in s.items():
    print(f"Capital({country}) = {capital}")

Capital(USA) = Washington D.C.
Capital(Canada) = Ottawa
Capital(UK) = London
Capital(France) = Paris


    We can also use fancy indexing and boolean masking in series as it is built on top of numpy library.

In [15]:
s

USA       Washington D.C.
Canada             Ottawa
UK                 London
France              Paris
dtype: object

In [16]:
s[['USA', 'UK']]

USA    Washington D.C.
UK              London
dtype: object

In [17]:
mask = (s == 'London')
mask

USA       False
Canada    False
UK         True
France    False
dtype: bool

In [18]:
s[mask]

UK    London
dtype: object

In [19]:
type(s[mask])

pandas.core.series.Series

    We can also use slicing in pandas series the same way we do in numpy arrays and list

In [20]:
s = pd.Series([i*10 for i in range(1, 11)], index = list('abcdefghij'))
s

a     10
b     20
c     30
d     40
e     50
f     60
g     70
h     80
i     90
j    100
dtype: int64

In [21]:
s.index

Index(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'], dtype='object')

In [22]:
s['a': 'g'] # when we do slicing using the explicit indexing then the last index is included

a    10
b    20
c    30
d    40
e    50
f    60
g    70
dtype: int64

In [23]:
s[0:7] # The element at the 7th index is not included as we have used implicit indexing

a    10
b    20
c    30
d    40
e    50
f    60
g    70
dtype: int64

### point of confusion
    We can also use numerical values for the explicit index. In this case confusion arises while slicing and using [] operator.

In [24]:
s = pd.Series(list('abcdefghij'), index = [i * 10 for i in range(1, 11)])
s

10     a
20     b
30     c
40     d
50     e
60     f
70     g
80     h
90     i
100    j
dtype: object

In [25]:
s[0] # as 0 is not present in the explicit index

KeyError: 0

In [26]:
s[10]

'a'

In [27]:
s[0:5] # while slicing it uses the implicit indexing and not explicit indexing.

10    a
20    b
30    c
40    d
50    e
dtype: object

In [28]:
s[10:50] # as when we use numerical index as explicit index, then while slicing it uses the implicit indexing

Series([], dtype: object)

# iloc and loc properties

In [29]:
s

10     a
20     b
30     c
40     d
50     e
60     f
70     g
80     h
90     i
100    j
dtype: object

In [30]:
s.iloc[0]

'a'

In [31]:
s.iloc[0:6]

10    a
20    b
30    c
40    d
50    e
60    f
dtype: object

In [32]:
s.loc[10]

'a'

In [33]:
s.loc[10:50]

10    a
20    b
30    c
40    d
50    e
dtype: object

# Intro to Series Methods

In [37]:
prices = pd.Series([2.99, 4.45, 1.36])
prices

0    2.99
1    4.45
2    1.36
dtype: float64

In [38]:
print(prices.sum())
print(prices.mean())
print(prices.product())
print(prices.std())

8.8
2.9333333333333336
18.095480000000006
1.5457791994115246


# Intro to Attributes
    An attribute is a piece of data that lives on an object.
    An attribute is a fact, a detail, a characteristic of the object.
    Access an attribute with object.attribute syntax.

In [39]:
adjectives = pd.Series(['Smart', 'Handsome', 'Charming', 'Brilliant', 'Humble', 'Smart'])
adjectives

0        Smart
1     Handsome
2     Charming
3    Brilliant
4       Humble
5        Smart
dtype: object

In [40]:
adjectives.size

6

In [41]:
adjectives.is_unique

False

 # Import series with pd.read_csv().squeeze() method

    A CSV is a plain text file that uses line breaks to seperate rows and commas to seperate row values.
    Pandas ships with many different read_ functions for different types of files.
    The read_csv() function accepts many different paramters. The first one specifies the file name/path.
    The read_csv() function will import the dataset as a Dataframe, a 2-dimensional table.
    The usecols parameter accepts a list of the columns to import.
    The squeeze method converts a DataFrame to Series.

In [42]:
pokemon = pd.read_csv('datasets/pokemon.csv')
pokemon

Unnamed: 0,Name,Type
0,Bulbasaur,"Grass, Poison"
1,Ivysaur,"Grass, Poison"
2,Venusaur,"Grass, Poison"
3,Charmander,Fire
4,Charmeleon,Fire
...,...,...
1005,Iron Valiant,"Fairy, Fighting"
1006,Koraidon,"Fighting, Dragon"
1007,Miraidon,"Electric, Dragon"
1008,Walking Wake,"Water, Dragon"


In [43]:
pokemon = pd.read_csv('datasets/pokemon.csv', usecols=['Name'])
pokemon

Unnamed: 0,Name
0,Bulbasaur
1,Ivysaur
2,Venusaur
3,Charmander
4,Charmeleon
...,...
1005,Iron Valiant
1006,Koraidon
1007,Miraidon
1008,Walking Wake


In [44]:
pokemon = pd.read_csv('datasets/pokemon.csv', usecols=['Name']).squeeze()
pokemon

0          Bulbasaur
1            Ivysaur
2           Venusaur
3         Charmander
4         Charmeleon
            ...     
1005    Iron Valiant
1006        Koraidon
1007        Miraidon
1008    Walking Wake
1009     Iron Leaves
Name: Name, Length: 1010, dtype: object

# head and tail method

In [46]:
pokemon = pd.read_csv('datasets/pokemon.csv', usecols=['Name']).squeeze()
pokemon

0          Bulbasaur
1            Ivysaur
2           Venusaur
3         Charmander
4         Charmeleon
            ...     
1005    Iron Valiant
1006        Koraidon
1007        Miraidon
1008    Walking Wake
1009     Iron Leaves
Name: Name, Length: 1010, dtype: object

In [47]:
first_five_pokemons = pokemon.head()
first_five_pokemons

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

    The head and tail methods create a view and not a copy.

In [48]:
first_five_pokemons.iloc[1] = 'Alphasaur'
first_five_pokemons

0     Bulbasaur
1     Alphasaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Name, dtype: object

In [51]:
pokemon # Note that changing the head also changed the original dataframe(series).

0          Bulbasaur
1          Alphasaur
2           Venusaur
3         Charmander
4         Charmeleon
            ...     
1005    Iron Valiant
1006        Koraidon
1007        Miraidon
1008    Walking Wake
1009     Iron Leaves
Name: Name, Length: 1010, dtype: object

In [52]:
pokemon.head(10)

0     Bulbasaur
1     Alphasaur
2      Venusaur
3    Charmander
4    Charmeleon
5     Charizard
6      Squirtle
7     Wartortle
8     Blastoise
9      Caterpie
Name: Name, dtype: object

In [53]:
pokemon.tail()

1005    Iron Valiant
1006        Koraidon
1007        Miraidon
1008    Walking Wake
1009     Iron Leaves
Name: Name, dtype: object

In [54]:
pokemon.tail(10)

1000        Wo-Chien
1001       Chien-Pao
1002         Ting-Lu
1003          Chi-Yu
1004    Roaring Moon
1005    Iron Valiant
1006        Koraidon
1007        Miraidon
1008    Walking Wake
1009     Iron Leaves
Name: Name, dtype: object