# Contents

You will learn about pandas Series, specifically:
1. How to create pandas Series
2. What are indexes and how to access data through them
    1. Two very important operations `.loc` and `.iloc`
3. What type of data a Series can store
4. What are the datatypes of pandas Series
5. How to extract data from a Series into numpy datatype

# Imports

In [1]:
import pandas as pd

# What is a Pandas Series

A pandas Series is a 1-dimensional array of data. The documentation for pandas.Series can be accessed through [this link](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html)

# Creating a Pandas Series

Creating a pandas series is very easy, you just have to call the pd.Series object on some [Iterable data](https://www.pythonlikeyoumeanit.com/Module2_EssentialsOfPython/Iterables.html) (eg.: python list, range, tuple, numpy array, etc...) that is 1-dimensional.

In [2]:
pd.Series(['Washington, D.C.', 'Ottawa', 'London', 'Berlin', 'Paris', 'Tokyo', 'Canberra', 'Brasília', 'New Delhi', 'Beijing'])

0    Washington, D.C.
1              Ottawa
2              London
3              Berlin
4               Paris
5               Tokyo
6            Canberra
7            Brasília
8           New Delhi
9             Beijing
dtype: object

You can also pass in a dictionary

In [3]:
country_capital_dict = {
    'United States': 'Washington, D.C.',
    'Canada': 'Ottawa',
    'United Kingdom': 'London',
    'Germany': 'Berlin',
    'France': 'Paris',
    'Japan': 'Tokyo',
    'Australia': 'Canberra',
    'Brazil': 'Brasília',
    'India': 'New Delhi',
    'China': 'Beijing'
}

pd.Series(country_capital_dict)

United States     Washington, D.C.
Canada                      Ottawa
United Kingdom              London
Germany                     Berlin
France                       Paris
Japan                        Tokyo
Australia                 Canberra
Brazil                    Brasília
India                    New Delhi
China                      Beijing
dtype: object

# Indexes

Noticed the indexes up there? When you pass in an `Iterable`, pandas will automatically create an index for you starting from 0.

But you can also use dictionaries to create a pandas series. The `keys` will be the index of the series , and the dictionary `values` will be the data of the series.

Of course, you can also pass the `data` and `index` separately:

In [5]:
pd.Series(data=['Washington, D.C.', 'Ottawa', 'London'],
          index=['United States', 'Canada', 'United Kingdom']
         )

United States     Washington, D.C.
Canada                      Ottawa
United Kingdom              London
dtype: object

Also,

In [6]:
pd.Series(data=['Mark','Larry','Steve','Sundar'],
          index=['Facebook','Google','Apple','Google'])

Facebook      Mark
Google       Larry
Apple        Steve
Google      Sundar
dtype: object

Index values can be repeated!

![](../media/panda-approves.jpg)

## How to access data in pandas Series?

Why are indexes useful in pandas Series? They allow you to access data more easily:

In [7]:
ceos = pd.Series(data=["Larry", "Bill", "Mark", "Steve"], 
               index=["Google", "Microsoft", "Facebook", "Apple"])
ceos

Google       Larry
Microsoft     Bill
Facebook      Mark
Apple        Steve
dtype: object

We can access a specific row using the index value

In [8]:
ceos['Google']

'Larry'

Which can also be done with `.loc[]`

In [9]:
ceos.loc['Google']

'Larry'

We can also access by position like we already do in lists:

In [10]:
ceos.iloc[0]

'Larry'

Negative positions work as well

In [11]:
ceos.iloc[-1]

'Steve'

And of course, slicing also works here

In [12]:
ceos.iloc[1:3] # returns another Series object with only these two elements! 

Microsoft    Bill
Facebook     Mark
dtype: object

What is the difference between **.loc** and **.iloc**?

- **.loc**: access values exactly like they appear on the index
- **.iloc**: access **i**ndex positions

![](../media/dory.jpg)

# What can a Pandas Series store?

Anything really. Can be numbers

In [13]:
pd.Series([1,4,6,3,2])

0    1
1    4
2    6
3    3
4    2
dtype: int64

Can be strings

In [14]:
pd.Series(['Washington, D.C.', 'Ottawa', 'London', 'Berlin', 'Paris'])

0    Washington, D.C.
1              Ottawa
2              London
3              Berlin
4               Paris
dtype: object

any data type really

In [17]:
pd.Series([{'a':1,'b':2},{'a':6,'b':7}])

0    {'a': 1, 'b': 2}
1    {'a': 6, 'b': 7}
dtype: object

Including custom objects

In [18]:
class Student:
    def __init__(self,name, number):
        self.name = name
        self.number = number
    def __repr__(self):
        return f'Student {self.name}, no. {self.number}'
    
pd.Series([Student('João',54728), Student('Inês',55782), Student('Diogo', 53829)])

0     Student João, no. 54728
1     Student Inês, no. 55782
2    Student Diogo, no. 53829
dtype: object

Finally, it can also store a mix of different types

In [19]:
pd.Series([1,2,Student('Sara', 18290), 'Apparently this works'])

0                          1
1                          2
2    Student Sara, no. 18290
3      Apparently this works
dtype: object

![](../media/fry.jpg)

# Datatypes

Pandas is intimately connected to NumPy and also uses its datatypes `float`, `int`, `bool`, `timedelta64[ns]` and `datetime64[ns]`. In pandas, all of these are **64bit** types by default. 

In addition, pandas has its own so-called extension datatypes, e.g `strings`, `periods`, `intervals`, `categoricals`. You can see the full list [here](https://pandas.pydata.org/docs/user_guide/basics.html#dtypes) (no need to memorize it).

In [21]:
pd.Series([1,4,6,3,2])

0    1
1    4
2    6
3    3
4    2
dtype: int64

Because this series only stores numbers, pandas automatically assigns the datatype `int64` to this series.

In [22]:
pd.Series(['Washington, D.C.', 'Ottawa', 'London', 'Berlin', 'Paris'])

0    Washington, D.C.
1              Ottawa
2              London
3              Berlin
4               Paris
dtype: object

If you have a Series of just strings, pandas will use the dtype `object`.

In [23]:
pd.Series([1, 'two', 3])

0      1
1    two
2      3
dtype: object

But wait! What's happening here? This Series also says it is of type object

The dtype `object` is pandas way of storing data when the elements of the series are heterogeneous **or** when the data types are more complex than the built-in numerical or datetime types. 

However it uses this same object for strings as well by default. Though you can change that:

In [24]:
pd.Series(['one', 'two', 'three'],dtype='string')

0      one
1      two
2    three
dtype: string

A series with type `string` will never allow you to store any other dtype in this object. This can be very important because during data manipulation and performing several operations you may, by mistake, try to store numbers or something else in this series. And the `string` dtype helps ensure you don't make that mistake - it's a safety mechanism.

# Extracting data from a Series

Sometimes you need to extract the data from your series and have it in the form of an array. The `to_numpy()` method helps you extracting the data as NumPy array

In [25]:
my_series = pd.Series([1,2,3,4,5],index=['A','B','C','D','E'])
my_series

A    1
B    2
C    3
D    4
E    5
dtype: int64

In [26]:
my_series.to_numpy()

array([1, 2, 3, 4, 5])

Similar thing if you need the index

In [27]:
my_series.index.to_numpy()

array(['A', 'B', 'C', 'D', 'E'], dtype=object)