A pandas Series object is a one-dimensional, labeled array made up of an autogenerated index that starts at 0 and data of a single data type.

A couple of important things to note about a Series:

- If I try to make a pandas Series using multiple data types like int and string values, the data will be converted to the same object data type; the int values will lose their int functionality.

- A pandas Series can be created in several ways; we will look at a few of these ways below. However, **it will most often be created by selecting a single column from a pandas Dataframe in which case the Series retains the same index as the Dataframe.** We will dive into this in the next two lessons: DataFrames and Advanced DataFrames.

In [3]:
import numpy as np
import pandas as pd # Convention is to import the `pandas` module with the alias pd
import matplotlib.pyplot as plt

## Create a series
We can use the pandas Series constructor function to create a Series:


In [9]:
# from a Python list or a NumPy array.

numbers_series = pd.Series([100, 43, 26, 17, 17])
type(numbers_series)

pandas.core.series.Series

In [7]:
numbers_series

0    100
1     43
2     26
3     17
4     17
dtype: int64

In [10]:
# Notice what happens when we create a Series containing different data types:

pd.Series([3, 2, 4.5])

0    3.0
1    2.0
2    4.5
dtype: float64

In [11]:
letters_series = pd.Series(['a', 'e', 'h', 'd', 'b', 'z'])
letters_series

0    a
1    e
2    h
3    d
4    b
5    z
dtype: object

In [12]:
# from a Python dictionary

labeled_series = pd.Series({'a' : 0, 'b' : 1.5, 'c' : 2, 'd': 3.5, 'e': 4, 'f': 5.5})
labeled_series

a    0.0
b    1.5
c    2.0
d    3.5
e    4.0
f    5.5
dtype: float64

## Vectorized Operation
Like numpy arrays, pandas series are vectorized by default, for example, we can easily use the basic arithmatic operators to manipulate every element in the series.

In [13]:
numbers_series + 1

0    101
1     44
2     27
3     18
4     18
dtype: int64

In [14]:
numbers_series / 2

0    50.0
1    21.5
2    13.0
3     8.5
4     8.5
dtype: float64

In [15]:
# Comparison operators also work:

numbers_series == 17

0    False
1    False
2    False
3     True
4     True
dtype: bool

In [17]:
numbers_series > 40

0     True
1     True
2    False
3    False
4    False
dtype: bool

## Series Attributes
**Attributes** return useful information about a Series' properties; they don't perform operations or calculations with the Series. Attributes are easily accessible using dot notation like we will see in the examples below. Jupyter Notebook allows you to quickly access a list of available attributes by pressing the tab key after the series name followed by a period or dot; this is called dot notation or attribute access.

There are several components that make up a pandas Series, and I can easliy access each component by using attributes.



In [18]:
# .index allows us to reference items in the series.

numbers_series.index

RangeIndex(start=0, stop=5, step=1)

In [20]:
# .values are my data
# The values are stored in a NumPy array. Hello vectorized operations!

numbers_series.values

array([100,  43,  26,  17,  17])

In [22]:
# .dtype is the data type of the elements in the Series.
# Panda has several main data types:
        ## int
        ## float
        ## bool
        ## object: strings
        ## category: a fixed and limited set of sting values
        
numbers_series.dtype

dtype('int64')

In [25]:
# .name is an optional human-friendly name for the Series

numbers_series.name = 'Numbers'
numbers_series

0    100
1     43
2     26
3     17
4     17
Name: Numbers, dtype: int64

In [27]:
# .size attribute returns an int representing the number of rows in the Series
# NULL values are included

numbers_series.size

5

In [28]:
# .shape attribute returns a tuple representing the rows and columns when used on a two-dimensional structure like a DataFrame,
# but it can also be used on a Series to return its number of rows. NULL values are included.

numbers_series.shape

(5,)

## Series Methods
**Methods** used on pandas Series objects often return new Series objects; most also offer parameters with default settings designed to keep the user from mutating the original Series objects. (inplace=False)

- If I want to save any manipulations or transformations I make on my Series, I can either assign the Series to a variable or adjust my parameters.

Series have a number of useful methods that we can use for various sorts of manipulations and transformations; let's look at a few.



In [32]:
# .head(n) method returns the first n rows in the Series; n = 5 by default.
# This method returns a new Series with the same indexing as the original Series/

numbers_series.head(2)

0    100
1     43
Name: Numbers, dtype: int64

In [34]:
# .tail(n) method returns the last n rows in the Series; n = 5 by default.
# Increase/decrease your values for n to return more/less than 5 rows.
numbers_series.tail(2)

3    17
4    17
Name: Numbers, dtype: int64

In [35]:
# .sample(n) method returns a random sample of rows in the Series; n = 1 by default.
# The index is retained.

numbers_series.sample()

3    17
Name: Numbers, dtype: int64

In [37]:
# .astype --> we can convert the data types of the values in our Series with the .astype method.
num_strings = pd.Series([3, 4, 5, 6]).astype('str')
num_strings

0    3
1    4
2    5
3    6
dtype: object

In [39]:
floats = pd.Series([3, 4, 5, 6]).astype('float')
floats 

0    3.0
1    4.0
2    5.0
3    6.0
dtype: float64

In [40]:
floats.astype('int')

0    3
1    4
2    5
3    6
dtype: int64

In [42]:
# .value_counts method returns a new Series consisting of a labeled index
# representing the unqiue values from the original Series and values
# representing the frequency each unique value appears in the original Series.
# It's like SQL performing a GROUP BY with a COUNT

pd.Series(['a', 'b', 'a', 'c', 'b', 'a', 'd', 'a']).value_counts()


a    4
b    2
c    1
d    1
dtype: int64

In [43]:
# .describe method returns a Series of descriptive stats on a pandas Series.
# The info it returns depends on the data type of the elements in the Series

numbers_series.describe()

count      5.000000
mean      40.600000
std       34.861153
min       17.000000
25%       17.000000
50%       26.000000
75%       43.000000
max      100.000000
Name: Numbers, dtype: float64

In [44]:
# .count()
# .sum()
# .mean()
{
    'count': numbers_series.count(),
    'sum': numbers_series.sum(),
    'mean': numbers_series.mean()
}

{'count': 5, 'sum': 203, 'mean': 40.6}

In [None]:
# .nlargest()
# .nsmallest()
# These methods return the n largest or n smallest values from a pandas Series.
# Can set the *keep* parameter to *first*, *last*, or *all* to deal with duplicate largest or smallest values

