In [2]:
import pandas as pd

## Create A `Series` Object from A Python List

##### Series has a dtype , indicating the datatype of the elements in the Series.
##### In below example, the elements are of String type. When Elements are of String type, Pandas shows dtype as 'Object'
##### By Default, if we dont provide an Index, Pandas provides a numeric index starting with 0. Index is displayed below on the left.
##### Unlike other data structures in Python, in Pandas index labels can be non numeric as well. Ex: String, Datetime etc

##### pd.Series is the constructor method that creates a Series object from python datatypes like lists

In [2]:
ice_cream = ["Chocolate", "Vanilla", "Strawberry", "Rum Raisin"]

pd.Series(ice_cream)

0     Chocolate
1       Vanilla
2    Strawberry
3    Rum Raisin
dtype: object

##### A Series containing Integers. Note that the dtype is int64
##### dtypes refers to the type of the values in the Series and not index

In [7]:
lottery = [4, 8, 15, 16, 23, 42]

pd.Series(lottery)

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64

##### A Series containing Booleans. Note that the dtype is bool

In [6]:
registrations = [True, False, False, False, True]

pd.Series(registrations)

0     True
1    False
2    False
3    False
4     True
dtype: bool

## Create A `Series` Object from a Dictionary

##### We recollect that when we dont provide an explicit index, default pandas behaviour is to provide a numeric index starting at 0. However as discussed before, index does not need to be numeric. It can be any datatype. This is the advantage Series has over lists.

##### When we create a Series from a dictionary, the keys in the dictionary become index labels, and values in dictionary become values in Series. Advantage of Series over Dict is that index labels need not be unique. We can have duplicate index labels.

##### So the key takeaway here is that again a series is kind of like a mishmash of a python list and a python dictionary. It combines the best features of both of them and adds a whole slew of available functionalities on top.

In [5]:
webster = {"Aardvark" : "An animal",
           "Banana" : "A delicious fruit",
           "Cyan" : "A color"}

pd.Series(webster)

Aardvark            An animal
Banana      A delicious fruit
Cyan                  A color
dtype: object

## Intro to Attributes

##### Objects in Python have attributes and methods.Attributes return information about the object. Attributes do not modify an object.So they don't actually manipulate it in any way or destroy it or edit it. In comparison , methods operate upon an object,  they do perform some kind of manipulation or operation or calculation.


In [3]:
about_me = ["Smart", "Handsome", "Charming", "Brilliant", "Humble"]
s = pd.Series(about_me)
s

0        Smart
1     Handsome
2     Charming
3    Brilliant
4       Humble
dtype: object

##### **s.values**: Return Series as ndarray or ndarray-like depending on the dtype.
##### **s.index**: Returns an Index object for the Series. RangeIndex is an Immutable Index implementing a monotonic integer range. This is the default index type used by DataFrame and Series when no explicit index is provided by the user.
##### **s.dtype**: Return the dtype object of the underlying data. This indicates the type of elements in the Series

In [4]:
s.values

array(['Smart', 'Handsome', 'Charming', 'Brilliant', 'Humble'],
      dtype=object)

In [5]:
s.index

RangeIndex(start=0, stop=5, step=1)

In [6]:
s.dtype

dtype('O')

## Intro to Methods

In [8]:
prices = [2.99, 4.45, 1.36]
s = pd.Series(prices)
s

0    2.99
1    4.45
2    1.36
dtype: float64

**The sum() method on a Series object**: Return the sum of the values for the requested axis.

**Signature**:
`s.sum(
    axis=None,
    skipna=None,
    level=None,
    numeric_only=None,
    min_count=0,
    **kwargs,
)`


`
axis : {index (0)}
    Axis for the function to be applied on.
skipna : bool, default True
    Exclude NA/null values when computing the result.
level : int or level name, default None
    If the axis is a MultiIndex (hierarchical), count along a
    particular level, collapsing into a scalar.
numeric_only : bool, default None
    Include only float, int, boolean columns. If None, will attempt to use
    everything, then use only numeric data. Not implemented for Series.
min_count : int, default 0
    The required number of valid values to perform the operation. If fewer than
    ``min_count`` non-NA values are present the result will be NA.`
    
--------


In [9]:
s.sum()

8.8

**The product() method on Series object**: Return the product of the values for the requested axis

**Signature**:
`s.product(
    axis=None,
    skipna=None,
    level=None,
    numeric_only=None,
    min_count=0,
    **kwargs,
)`

`
axis : {index (0)}
    Axis for the function to be applied on.
skipna : bool, default True
    Exclude NA/null values when computing the result.
level : int or level name, default None
    If the axis is a MultiIndex (hierarchical), count along a
    particular level, collapsing into a scalar.
numeric_only : bool, default None
    Include only float, int, boolean columns. If None, will attempt to use
    everything, then use only numeric data. Not implemented for Series.
min_count : int, default 0
    The required number of valid values to perform the operation. If fewer than
    ``min_count`` non-NA values are present the result will be NA.`

In [10]:
s.product()

18.095480000000006

**The mean() method on a Series object:**Return the mean of the values for the requested axis.

**Signature:** `s.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)`


`axis : {index (0)}
    Axis for the function to be applied on.
skipna : bool, default True
    Exclude NA/null values when computing the result.
level : int or level name, default None
    If the axis is a MultiIndex (hierarchical), count along a
    particular level, collapsing into a scalar.
numeric_only : bool, default None
    Include only float, int, boolean columns. If None, will attempt to use
    everything, then use only numeric data. Not implemented for Series.
**kwargs
    Additional keyword arguments to be passed to the function.`


In [11]:
s.mean()

2.9333333333333336

## Parameters and Arguments

In [None]:
# Difficulty - Easy, Medium, Hard
# Volume - 1 through 10
# Subtitles - True / False

##### The pd.Series() method has many positional parameters. The first one is `data` and 2nd one is `index`.
##### Thus we can provide iterables to be used as data and index. The length of both data and index should be same or we would get an error.

**The pd.Series() method:** 
##### One-dimensional ndarray with axis labels (including time series).
##### Labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as NaN).
##### Operations between Series (+, -, /, *, ** ) align values based on their associated index values-- they need not be the same length. The result index will be the sorted union of the two indexes.

`Init signature:
pd.Series(
    data=None,
    index=None,
    dtype=None,
    name=None,
    copy=False,
    fastpath=False,
)`


`
data : array-like, Iterable, dict, or scalar value
    Contains data stored in Series.
index : array-like or Index (1d)
    Values must be hashable and have the same length as `data`.
    Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, ..., n) if not provided. If both a dict and index
    sequence are used, the index will override the keys found in the dict.    
dtype : str, numpy.dtype, or ExtensionDtype, optional
    Data type for the output Series. If not specified, this will be inferred from `data`.
name : str, optional
    The name to give to the Series.
copy : bool, default False
    Copy input data.`

In [21]:
fruits = ["Apple", "Orange", "Plum", "Grape", "Blueberry"]
weekdays = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]

pd.Series(fruits, weekdays)
pd.Series(data = fruits, index = weekdays)
pd.Series(fruits, index = weekdays)

Monday           Apple
Tuesday         Orange
Wednesday         Plum
Thursday         Grape
Friday       Blueberry
dtype: object

##### Below shows that its okay to have duplicate index labels. When there are duplicates in the index labels, some data operations may not be possible. However point to remember is that index labels need not be unique.

In [22]:
fruits = ["Apple", "Orange", "Plum", "Grape", "Blueberry", "Watermelon"]
weekdays = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Monday"]

pd.Series(data = fruits, index = weekdays)

Monday            Apple
Tuesday          Orange
Wednesday          Plum
Thursday          Grape
Friday        Blueberry
Monday       Watermelon
dtype: object

## Import `Series` with the `read_csv` Method

**The pd.read_csv() method** : Read a comma-separated values (csv) file into DataFrame. Also supports optionally iterating or breaking of the file into chunks.

**Signature:**
`pd.read_csv(
    filepath_or_buffer: Union[str, pathlib.Path, IO[~AnyStr]],
    sep=',',
    delimiter=None,
    header='infer',
    names=None,
    index_col=None,
    usecols=None,
    squeeze=False,
    prefix=None,
    mangle_dupe_cols=True,
    dtype=None,
    engine=None,
    converters=None,
    true_values=None,
    false_values=None,
    skipinitialspace=False,
    skiprows=None,
    skipfooter=0,
    nrows=None,
    na_values=None,
    keep_default_na=True,
    na_filter=True,
    verbose=False,
    skip_blank_lines=True,
    parse_dates=False,
    infer_datetime_format=False,
    keep_date_col=False,
    date_parser=None,
    dayfirst=False,
    cache_dates=True,
    iterator=False,
    chunksize=None,
    compression='infer',
    thousands=None,
    decimal: str = '.',
    lineterminator=None,
    quotechar='"',
    quoting=0,
    doublequote=True,
    escapechar=None,
    comment=None,
    encoding=None,
    dialect=None,
    error_bad_lines=True,
    warn_bad_lines=True,
    delim_whitespace=False,
    low_memory=True,
    memory_map=False,
    float_precision=None,
)`


##### There are many more read_* methods available in python. Intresting ones include read_excel,read_clipboard,read_json,read_orc,read_pickle, read_sql, read_sql_query,read_sql_table, read_table etc

![image.png](attachment:4d06a079-24a2-4c00-aa86-a8c0ae9b33fd.png)
![image.png](attachment:b179e945-fd87-4ed4-9064-1e557d2c129c.png)

**filepath_or_buffer: str, path object or file-like object**
    `Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.csv.
    If you want to pass in a path object, pandas accepts any ``os.PathLike``.
    By file-like object, we refer to objects with a ``read()`` method, such as a file handler (e.g. via builtin ``open`` function) or ``StringIO``.`
    
**usecols: list-like or callable, optional**
    `Return a subset of the columns. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in `names` or inferred from the document header row(s). 
For example, a valid list-like `usecols` parameter would be ``[0, 1, 2]`` or ``['foo', 'bar', 'baz']``.
Element order is ignored, so ``usecols=[0, 1]`` is the same as ``[1, 0]``. To instantiate a DataFrame from ``data`` with element order preserved use ``pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]`` for columns in ``['foo', 'bar']`` order or ``pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]`` for ``['bar', 'foo']`` order.
If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be ``lambda x: x.upper() in ['AAA', 'BBB', 'DDD']``. Using this parameter results in much faster parsing time and lower memory usage.`

**squeeze: bool, default False**
    `If the parsed data only contains one column then return a Series. By Default, read_csv returns a Dataframe`

In [12]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
pokemon

0       Bulbasaur
1         Ivysaur
2        Venusaur
3      Charmander
4      Charmeleon
          ...    
716       Yveltal
717       Zygarde
718       Diancie
719         Hoopa
720     Volcanion
Name: Pokemon, Length: 721, dtype: object

##### Pandas by default shows top 5 and bottom 5 rows from our dataframe. The ... indicates that middle part is not displayed.

In [13]:
# Tab completion also works for file names
google = pd.read_csv("google_stock_price.csv", squeeze = True)
google

0        50.12
1        54.10
2        54.65
3        52.38
4        52.95
         ...  
3007    772.88
3008    771.07
3009    773.18
3010    771.61
3011    782.22
Name: Stock Price, Length: 3012, dtype: float64

## The `.head()` and `.tail()` Methods

In [31]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
google = pd.read_csv("google_stock_price.csv", squeeze = True)

#### **The head() method**:  Return the first `n` rows.This function returns the first `n` rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.
#### For negative values of `n`, this function returns all rows except the last `n` rows, equivalent to ``df[:-n]``.

#### Note that return type is FrameOrSeries, indicating that methods head() and tail() work for both dataframes and Series.
`
Signature: pokemon.head(n: int = 5) -> ~FrameOrSeries.
Parameters:
n : int, default 5
    Number of rows to select.
Returns:
same type as caller
    The first `n` rows of the caller object.`

In [36]:
pokemon.head(1)

0    Bulbasaur
Name: Pokemon, dtype: object

#### **The tail() method**: Return the last `n` rows. This function returns last `n` rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.
#### For negative values of `n`, this function returns all rows except the first `n` rows, equivalent to ``df[n:]``.

`Signature: google.tail(n: int = 5) -> ~FrameOrSeries
Parameters
n : int, default 5
    Number of rows to select.
Returns
type of caller
    The last `n` rows of the caller object.`

In [40]:
google.tail(1)

3011    782.22
Name: Stock Price, dtype: float64

## Python Built-In Functions

In [54]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
google = pd.read_csv("google_stock_price.csv", squeeze = True)

#### The Series object plays really well with Python builtin functions. Here we are just passing our Series to pythons in-built functions to see how they work. Do note that pandas itself has its own methods that can do the same thing.

#### The len() function when pass a Series, gives us the number of elements in the Series.
#### We can also pass a Series to type() function to know exact type. Note that we get pandas.core.series.Series
#### Passing a Series to a dir() function gives us a list of all attributes and methods available on the Series object. We can also see the dunder methods.
#### We can also pass a Series to sorted() function to get a sorted list of elements.
#### We can use list() and dict() constructors with Series object to convert it to a list of dict respectively. These operations are kind of reverse of using pd.Series(listobject) and pd.Series(dictobject).
#### We can also use min/max python builtin functions on a Series to get the minimum and maximum values.


#### Summary: Pandas plays friendly with existing python built in functions.

In [56]:
len(pokemon)
len(google)

3012

In [57]:
type(pokemon)

pandas.core.series.Series

In [14]:
dir(pokemon)

['T',
 '_AXIS_ALIASES',
 '_AXIS_IALIASES',
 '_AXIS_LEN',
 '_AXIS_NAMES',
 '_AXIS_NUMBERS',
 '_AXIS_ORDERS',
 '_AXIS_REVERSED',
 '_HANDLED_TYPES',
 '__abs__',
 '__add__',
 '__and__',
 '__annotations__',
 '__array__',
 '__array_priority__',
 '__array_ufunc__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__div__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__finalize__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__imod__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__long__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__or__

In [17]:
print(type(sorted(pokemon)))
sorted(pokemon)
sorted(google)

<class 'list'>


[49.95,
 50.07,
 50.12,
 50.7,
 50.74,
 50.95,
 51.1,
 51.1,
 51.13,
 52.38,
 52.61,
 52.95,
 53.02,
 53.7,
 53.9,
 54.1,
 54.65,
 55.69,
 55.94,
 56.93,
 58.69,
 58.86,
 59.07,
 59.13,
 59.62,
 59.86,
 60.35,
 63.37,
 64.74,
 65.47,
 66.22,
 67.46,
 67.56,
 68.47,
 68.63,
 68.8,
 69.12,
 69.36,
 70.17,
 70.38,
 70.93,
 71.98,
 73.9,
 74.51,
 74.62,
 82.47,
 83.68,
 83.69,
 83.85,
 84.27,
 84.59,
 84.62,
 84.91,
 85.14,
 85.63,
 85.74,
 86.13,
 86.16,
 86.19,
 86.19,
 86.63,
 87.29,
 87.41,
 87.71,
 88.06,
 88.15,
 88.47,
 88.81,
 89.21,
 89.22,
 89.26,
 89.4,
 89.54,
 89.56,
 89.61,
 89.61,
 89.7,
 89.8,
 89.89,
 89.9,
 89.93,
 89.93,
 89.95,
 90.11,
 90.13,
 90.16,
 90.27,
 90.35,
 90.43,
 90.58,
 90.62,
 90.81,
 90.9,
 90.91,
 91.42,
 91.78,
 92.26,
 92.34,
 92.41,
 92.42,
 92.5,
 92.51,
 92.55,
 92.84,
 92.86,
 92.89,
 92.94,
 93.06,
 93.39,
 93.41,
 93.61,
 93.61,
 93.86,
 93.9,
 93.9,
 93.95,
 94.05,
 94.18,
 94.19,
 94.31,
 94.35,
 94.52,
 94.53,
 95.07,
 95.22,
 95.59,
 95.6,
 

In [61]:
list(pokemon)

['Bulbasaur',
 'Ivysaur',
 'Venusaur',
 'Charmander',
 'Charmeleon',
 'Charizard',
 'Squirtle',
 'Wartortle',
 'Blastoise',
 'Caterpie',
 'Metapod',
 'Butterfree',
 'Weedle',
 'Kakuna',
 'Beedrill',
 'Pidgey',
 'Pidgeotto',
 'Pidgeot',
 'Rattata',
 'Raticate',
 'Spearow',
 'Fearow',
 'Ekans',
 'Arbok',
 'Pikachu',
 'Raichu',
 'Sandshrew',
 'Sandslash',
 'Nidoran',
 'Nidorina',
 'Nidoqueen',
 'Nidoran♂',
 'Nidorino',
 'Nidoking',
 'Clefairy',
 'Clefable',
 'Vulpix',
 'Ninetales',
 'Jigglypuff',
 'Wigglytuff',
 'Zubat',
 'Golbat',
 'Oddish',
 'Gloom',
 'Vileplume',
 'Paras',
 'Parasect',
 'Venonat',
 'Venomoth',
 'Diglett',
 'Dugtrio',
 'Meowth',
 'Persian',
 'Psyduck',
 'Golduck',
 'Mankey',
 'Primeape',
 'Growlithe',
 'Arcanine',
 'Poliwag',
 'Poliwhirl',
 'Poliwrath',
 'Abra',
 'Kadabra',
 'Alakazam',
 'Machop',
 'Machoke',
 'Machamp',
 'Bellsprout',
 'Weepinbell',
 'Victreebel',
 'Tentacool',
 'Tentacruel',
 'Geodude',
 'Graveler',
 'Golem',
 'Ponyta',
 'Rapidash',
 'Slowpoke',
 'Slo

In [None]:
list(google)

In [62]:
dict(google)

{0: 50.119999999999997,
 1: 54.100000000000001,
 2: 54.649999999999999,
 3: 52.380000000000003,
 4: 52.950000000000003,
 5: 53.899999999999999,
 6: 53.020000000000003,
 7: 50.950000000000003,
 8: 51.130000000000003,
 9: 50.07,
 10: 50.700000000000003,
 11: 49.950000000000003,
 12: 50.740000000000002,
 13: 51.100000000000001,
 14: 51.100000000000001,
 15: 52.609999999999999,
 16: 53.700000000000003,
 17: 55.689999999999998,
 18: 55.939999999999998,
 19: 56.93,
 20: 58.689999999999998,
 21: 59.619999999999997,
 22: 58.859999999999999,
 23: 59.130000000000003,
 24: 60.350000000000001,
 25: 59.859999999999999,
 26: 59.07,
 27: 63.369999999999997,
 28: 65.469999999999999,
 29: 64.739999999999995,
 30: 66.219999999999999,
 31: 67.459999999999994,
 32: 69.120000000000005,
 33: 68.469999999999999,
 34: 69.359999999999999,
 35: 68.799999999999997,
 36: 67.560000000000002,
 37: 68.629999999999995,
 38: 70.379999999999995,
 39: 70.930000000000007,
 40: 71.980000000000004,
 41: 74.510000000000005,

In [64]:
max(pokemon)
min(pokemon)

'Abomasnow'

In [65]:
max(google)

782.22000000000003

In [66]:
min(google)

49.950000000000003

## More `Series` Attributes

In [75]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
google = pd.read_csv("google_stock_price.csv", squeeze = True)

#### The values attribute : Return Series as ndarray or ndarray-like depending on the dtype.
#### The dtype attribute: Return the dtype object of the underlying data. Returns dtype('O') if the elements of the Series are Strings.
#### The is_unique attribute: Return boolean True if values in the objects in Series are unique. Returns False when Series contains duplicate values.
#### The ndim attribute: Number of dimensions of the underlying data (by definition 1 for a Series). Can also be used for dataframes.
#### The shape attribute: Return a tuple of the shape of the underlying data. It indicates number of rows and columns. Makes more sense for multidimensional objects. For Series object, it returns a single element tuple. (3012,) indicates that Series has 3012 rows and 1 column. On a dataframe this returns 2 element tuple, with 1st element indicating the number of rows and othe indicating number of columns.
#### The size attribute: Return the number of elements/cells in the underlying data. This attribute does count nulls values. If the csv being read has blank lines, the Series size attribute would still count it.
#### The name attribute: Note that we also have a name attribute. This is taken from the top row of the input read(header in case of reading from csv). We can also see this name when the Series is displayed or when we perform head()/tail() on the Series. At the end, before the dtype value , we could see the name of the Series. Imp point is that we can assign a new value to this attribute in case we want to provide a new name.

In [77]:
pokemon.values
google.values

array([  50.12,   54.1 ,   54.65, ...,  773.18,  771.61,  782.22])

In [79]:
pokemon.index
google.index

RangeIndex(start=0, stop=3012, step=1)

In [19]:

print(pokemon.dtype)
print(google.dtype)
pokemon.dtype

object
float64


dtype('O')

In [83]:
pokemon.is_unique
google.is_unique

False

In [85]:
pokemon.ndim
google.ndim

1

In [87]:
pokemon.shape
google.shape

(3012,)

In [89]:
pokemon.size
google.size

3012

In [93]:
pokemon.name = "Pocket Monsters"

In [94]:
pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Pocket Monsters, dtype: object

## The `.sort_values()` Method

In [95]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
google = pd.read_csv("google_stock_price.csv", squeeze = True)

#### **The sort_values() method on the Series:** Sort a Series in ascending or descending order by some criterion.

`Signature:
pokemon.sort_values(
    axis=0,
    ascending=True,
    inplace=False,
    kind='quicksort',
    na_position='last',
    ignore_index=False,
)`

`
Parameters
axis : {0 or 'index'}, default 0
    Axis to direct sorting. The value 'index' is accepted for compatibility with DataFrame.sort_values.
ascending : bool, default True
    If True, sort values in ascending order, otherwise descending.
inplace : bool, default False
    If True, perform operation in-place.
kind : {'quicksort', 'mergesort' or 'heapsort'}, default 'quicksort'
    Choice of sorting algorithm. See also :func:`numpy.sort` for more information. 'mergesort' is the only stable  algorithm.
na_position : {'first' or 'last'}, default 'last'
    Argument 'first' puts NaNs at the beginning, 'last' puts NaNs at the end.
ignore_index : bool, default False
     If True, the resulting axis will be labeled 0, 1, …, n - 1.
     `

In [20]:
pokemon.sort_values().head() #Method Chaining.
#The sort_values() returns a new Series. On this returned object we can invoke the head() method.
# Notice that sorting on the values, messes the index order.

459    Abomasnow
62          Abra
358        Absol
616     Accelgor
680    Aegislash
Name: Pokemon, dtype: object

In [100]:
pokemon.sort_values(ascending = False).tail() #Default is ascending True. Use ascending=False to sort in descending manner.

680    Aegislash
616     Accelgor
358        Absol
62          Abra
459    Abomasnow
Name: Pokemon, dtype: object

#### By default, calling sort_values() does not modify the original Series. Instead it creates a new sorted Series and returns it.
#### To modifying the original Series in place, we should make use of the inplace parameter.

In [21]:
google.sort_values(ascending = False).head(1)

3011    782.22
Name: Stock Price, dtype: float64

In [22]:
google

0        50.12
1        54.10
2        54.65
3        52.38
4        52.95
         ...  
3007    772.88
3008    771.07
3009    773.18
3010    771.61
3011    782.22
Name: Stock Price, Length: 3012, dtype: float64

## The `inplace` Parameter

In [112]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
google = pd.read_csv("google_stock_price.csv", squeeze = True)

In [113]:
google.head(3)

0    50.12
1    54.10
2    54.65
Name: Stock Price, dtype: float64

In [115]:
google = google.sort_values()

In [118]:
google.head(3)

11    49.95
9     50.07
0     50.12
Name: Stock Price, dtype: float64

#### The inplace parameter is available with many pandas methods. This is equivalent to creating a new object from the operation and assigning it back to the same variable, thus giving us the feel that operation was performed in place.
#### We can use the above approach to perform inplace changes, in case there is no `inplace` parameter available on the operation.

In [119]:
google.sort_values(ascending = False, inplace = True)

In [120]:
google.head(3)

3011    782.22
2859    776.60
3009    773.18
Name: Stock Price, dtype: float64

## The `.sort_index()` Method

In [23]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
google = pd.read_csv("google_stock_price.csv", squeeze = True)

In [24]:
pokemon.sort_values(ascending = False, inplace = True)

In [25]:
pokemon.head(3)

717     Zygarde
633    Zweilous
40        Zubat
Name: Pokemon, dtype: object

#### Note that when we sort by values, the index labels are all put out of order. We can reorder the data using index labels using the sort_index method.

#### **The sort_index() method on Series**: Sort Series by index labels. Returns a new Series sorted by label if `inplace` argument is ``False``, otherwise updates the original series and returns None.

`Signature:
pokemon.sort_index(
    axis=0,
    level=None,
    ascending=True,
    inplace=False,
    kind='quicksort',
    na_position='last',
    sort_remaining=True,
    ignore_index: bool = False,
)`


`Parameters
axis : int, default 0
    Axis to direct sorting. This can only be 0 for Series.
level : int, optional
    If not None, sort on values in specified index level(s).
ascending : bool, default true
    Sort ascending vs. descending.
inplace : bool, default False
    If True, perform operation in-place.
kind : {'quicksort', 'mergesort', 'heapsort'}, default 'quicksort'
    Choice of sorting algorithm. See also :func:`numpy.sort` for more information.  'mergesort' is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.
na_position : {'first', 'last'}, default 'last'
    If 'first' puts NaNs at the beginning, 'last' puts NaNs at the end.
    Not implemented for MultiIndex.
sort_remaining : bool, default True
    If True and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level.
ignore_index : bool, default False
    If True, the resulting axis will be labeled 0, 1, …, n - 1.`

In [26]:
pokemon.sort_index(ascending = True, inplace = True)

## Python's `in` Keyword

In [149]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
google = pd.read_csv("google_stock_price.csv", squeeze = True)

In [151]:
100 in [1, 2, 3, 4, 5]

False

In [152]:
pokemon.head(3)

0    Bulbasaur
1      Ivysaur
2     Venusaur
Name: Pokemon, dtype: object

#### The python "in" keyword can be used with Series to containment check. By default, the "in" keyword searches in the index labels and not the values. That is what even when "Venusaur" is present in the Series, the statement `"Venusaur" in pokemon` returns False.

In [27]:
"Venusaur" in pokemon

False

#### Below shows that the index label 100 exists in the pokemon Series and since "in" looks within the index labels and not values, both the below statements are equivalent and return True

In [28]:
print(100 in pokemon)
print(100 in pokemon.index)

True
True


In [156]:
pokemon.index

RangeIndex(start=0, stop=721, step=1)

#### To check if a value exists in a Series,we can use the `in` operator with the `.values` attribute

In [161]:
"Digimon" in pokemon.values

False

## Extract Values by Index Position

In [162]:
pokemon = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
google = pd.read_csv("google_stock_price.csv", squeeze = True)

In [163]:
pokemon.head(3)

0    Bulbasaur
1      Ivysaur
2     Venusaur
Name: Pokemon, dtype: object

#### To extract values from Series, we can use indexing just as we use with lists.
#### For single value extraction, we can use seriesname[x] where x is the index position which we want to extract. This provides the actual value like String, int , float etc.

#### To get multiple values at different index positions, we have 2 options - 1) provide a list with exact index positions 2) provide range using `:` .i.e Slicing. When using slicing we can use negative indexes as well. Also similar to list, extraction goes upto, but does not include the 2nd value provided in slicing. Both these options return a new Series with only the selected elements. When using slicing , we could also provide a step value

In [29]:
pokemon[1] #returns a single element at the given index position.

pokemon[[100, 200, 300]] #Takes in a list of index positions to include in the output.

pokemon[50:101] #2nd index position is not included in the result. 

pokemon[:50] #Does not include index position 50

pokemon[-30:] #Negative indexes allowed

pokemon[-30 : -10]

691     Clauncher
692     Clawitzer
693    Helioptile
694     Heliolisk
695        Tyrunt
696     Tyrantrum
697        Amaura
698       Aurorus
699       Sylveon
700      Hawlucha
701       Dedenne
702       Carbink
703         Goomy
704       Sliggoo
705        Goodra
706        Klefki
707      Phantump
708     Trevenant
709     Pumpkaboo
710     Gourgeist
Name: Pokemon, dtype: object

## Extract Values by Index Label

**index_col:** int, str, sequence of int / str, or False, default ``None``

`Column(s) to use as the row labels of the ``DataFrame``, either given as string name or column index. If a sequence of int / str is given, a MultiIndex is used.
Note: ``index_col=False`` can be used to force pandas to *not* use the first column as the index, e.g. when you have a malformed file with delimiters at the end of each line.`

In [33]:
pokemon = pd.read_csv("pokemon.csv", index_col = "Pokemon", squeeze = True)
pokemon.head(3)

Pokemon
Bulbasaur    Grass
Ivysaur      Grass
Venusaur     Grass
Name: Type, dtype: object

#### Even if the index labels are not numeric or derived by default, each index label is still assigned an index position. So even if index labels are not numeric and sequential, we can access the elements of the Series using index positions as shown below.


In [34]:
pokemon[[100, 134]]

Pokemon
Electrode    Electric
Jolteon      Electric
Name: Type, dtype: object

#### Similar to when using index positions, we could use indexing and slicing with index labels. We could also pass a list of index labels to extract specific index labels. Point to note here is that when using slicing with index labels , the 2nd index label is included in the result (which is different from when we use index positions). When using slicing, we could also provide a step size.

In [38]:
pokemon["Bulbasaur"]
pokemon["Ditto"]
#pokemon[["Charizard", "Jolteon"]]
pokemon[["Blastoise", "Venusaur", "Meowth"]] #For extracting multiple values, we could also use a list

#pokemon[["Pikachu", "Digimon"]]

pokemon["Bulbasaur" : "Pikachu"] #slicing syntax also works with index labels. In this case the 2nd index label is included in the result.
pokemon["Bulbasaur":"Pikachu":2]

Pokemon
Bulbasaur        Grass
Venusaur         Grass
Charmeleon        Fire
Squirtle         Water
Blastoise        Water
Metapod            Bug
Weedle             Bug
Beedrill           Bug
Pidgeotto       Normal
Rattata         Normal
Spearow         Normal
Ekans           Poison
Pikachu       Electric
Name: Type, dtype: object

#### If a index label used to fetch the value does not exist, we get an Error (Keyerror).
#### Also note that when using a list of index labels, all the labels should exist in the Series. Even if a single label does not exist, we shall get an Error

## The `.get()` Method on a `Series`

In [39]:
pokemon = pd.read_csv("pokemon.csv", index_col = "Pokemon", squeeze = True)
pokemon.sort_index(inplace = True) #Sorting helps fetching performance.
pokemon.head(3)

Pokemon
Abomasnow      Grass
Abra         Psychic
Absol           Dark
Name: Type, dtype: object

#### The get() method :Get item from object for given key (ex: DataFrame column). Returns default value if not found. 
`Signature: pokemon.get(key, default=None)`
#### If the index position or label that we provide to the get method does not exist, we shall get nothing. However there wont be an error like in the case of using [] notation.
#### For the key, we could also provide a list of index positions or labels. However we need to be careful that if even a single index label/position does not exist in the Series, the result would be the default value specified. If no default value is specified, None is returned. This means that even of other index labels/positions were present,result would still be the one specified by the `default` parameter.

In [40]:
pokemon.get(key = ["Moltres", "Meowth"]) #The get method could also take a list of index positions or labels.

Pokemon
Moltres      Fire
Meowth     Normal
Name: Type, dtype: object

In [41]:
pokemon.get(key = ["Moltres", "Mth"]) 

In [42]:
pokemon.get(key = ["Moltres", "Mth","Meowth"],default="HHOHOH") 

'HHOHOH'

In [199]:
pokemon.get(key = "Charizard", default = "This is not a Pokemon")

'Fire'

In [202]:
pokemon.get(key = "jksajk", default = "This is not a Pokemon")

'This is not a Pokemon'

## Math Methods on `Series` Objects

In [43]:
google = pd.read_csv("google_stock_price.csv", squeeze = True)
google.head(3)

0    50.12
1    54.10
2    54.65
Name: Stock Price, dtype: float64

#### The count() method: Return number of non-NA/null observations in the Series.

`Signature: google.count(level=None)`



In [44]:
google.count()

3012

In [45]:
len(google)

3012

#### The sum() method:Return the sum of the values for the requested axis.

`Signature:
google.sum(
    axis=None,
    skipna=None,
    level=None,
    numeric_only=None,
    min_count=0,
    **kwargs,
)`

In [222]:
google.sum()

1006942.0000000002

#### The mean() method:  Return the mean of the values for the requested axis. 

`Signature:
google.mean(
    axis=None,
    skipna=None,
    level=None,
    numeric_only=None,
    **kwargs,
)`


In [223]:
google.mean()

334.31009296148744

In [224]:
google.sum() / google.count()

334.31009296148744

#### The std() method: Signature: Return sample standard deviation over requested axis. Normalized by N-1 by default. This can be changed using the ddof argument

`Signature : google.std(
    axis=None,
    skipna=None,
    level=None,
    ddof=1,
    numeric_only=None,
    **kwargs,
)`


In [225]:
google.std()

173.18720477113106

#### The min() method: Return the minimum of the values for the requested axis.
`Signature: google.min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)`

#### The max() method: Return the maximum of the values for the requested axis.
`Signature: google.max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)`

#### The median() method: Return the median of the values for the requested axis.
`Signature:
google.median(
    axis=None,
    skipna=None,
    level=None,
    numeric_only=None,
    **kwargs,
)`


In [226]:
google.min()

49.950000000000003

In [227]:
google.max()

782.22000000000003

In [228]:
google.median()

283.315

#### The mode() method: Return the mode(s) of the dataset. Always returns Series even if only one value is returned.
`Signature: google.mode(dropna=True)`

In [229]:
google.mode()

0    291.21
dtype: float64

#### The describe() method : Generate descriptive statistics. Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset's distribution, excluding ``NaN`` values. Analyzes both numeric and object series, as well as ``DataFrame`` column sets of mixed data types. The output will vary depending on what is provided.

`Signature: google.describe(percentiles=None, include=None, exclude=None) -> ~FrameOrSeries`

In [230]:
google.describe()

count    3012.000000
mean      334.310093
std       173.187205
min        49.950000
25%       218.045000
50%       283.315000
75%       443.000000
max       782.220000
Name: Stock Price, dtype: float64

## The `.idxmax()` and `.idxmin()` Methods

In [46]:
google = pd.read_csv("google_stock_price.csv", squeeze = True)

In [47]:
google.max()

782.22

In [48]:
google.min()

49.95


#### The idxmax() method: Return the row label of the maximum value. If multiple values equal the maximum, the first row label with that value is returned.Raises ValueError if the Series is empty.

`Signature: google.idxmax(axis=0, skipna=True, *args, **kwargs)`

`axis : int, default 0
    For compatibility with DataFrame.idxmax. Redundant for application on Series.
skipna : bool, default True
    Exclude NA/null values. If the entire Series is NA, the result will be NA.
*args, **kwargs
    Additional arguments and keywords have no effect but might be accepted for compatibility with NumPy.`



In [235]:
google.idxmax()

3011

In [236]:
google[3011]

782.22000000000003

#### The idxmin() method : Return the row label of the minimum value. If multiple values equal the minimum, the first row label with that value is returned. Raises ValueError if the Series is empty.

`Signature: google.idxmin(axis=0, skipna=True, *args, **kwargs)`

`axis : int, default 0
    For compatibility with DataFrame.idxmin. Redundant for application on Series.
skipna : bool, default True
    Exclude NA/null values. If the entire Series is NA, the result will be NA.
*args, **kwargs
    Additional arguments and keywords have no effect but might be accepted for compatibility with NumPy.`



In [237]:
google.idxmin()

11

In [238]:
google[11]

49.950000000000003

In [239]:
google[google.idxmin()]

49.950000000000003

## The `.value_counts()` Method

In [49]:
pokemon = pd.read_csv("pokemon.csv", index_col = "Pokemon", squeeze = True)
pokemon.head(3)

Pokemon
Bulbasaur    Grass
Ivysaur      Grass
Venusaur     Grass
Name: Type, dtype: object

#### The value_counts() method: Return a Series containing counts of unique values.The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.

`Signature:
pokemon.value_counts(
    normalize=False,
    sort=True,
    ascending=False,
    bins=None,
    dropna=True,
)`

`normalize : bool, default False
    If True then the object returned will contain the relative frequencies of the unique values.
sort : bool, default True
    Sort by frequencies.
ascending : bool, default False
    Sort in ascending order.
bins : int, optional
    Rather than count values, group them into half-open bins, a convenience for ``pd.cut``, only works with numeric data.
dropna : bool, default True
    Don't include counts of NaN.`

In [242]:
pokemon.value_counts().sum()

721

In [243]:
pokemon.count()

721

In [246]:
pokemon.value_counts(ascending = True) #Gives results in reverse order.

Flying        3
Fairy        17
Steel        22
Ice          23
Ghost        23
Dragon       24
Fighting     25
Dark         28
Poison       28
Ground       30
Electric     36
Rock         41
Psychic      47
Fire         47
Bug          63
Grass        66
Normal       93
Water       105
Name: Type, dtype: int64

## The `.apply()` Method

In [255]:
google = pd.read_csv("google_stock_price.csv", squeeze = True)
google.head(6)

0    50.12
1    54.10
2    54.65
3    52.38
4    52.95
5    53.90
Name: Stock Price, dtype: float64

In [256]:
def classify_performance(number):
    if number < 300:
        return "OK"
    elif number >= 300 and number < 650:
        return "Satisfactory"
    else:
        return "Incredible!"

#### The apply() method: Invoke function on values of Series. Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values.
`Signature: google.apply(func, convert_dtype=True, args=(), **kwds)`

`func : function
    Python function or NumPy ufunc to apply.
convert_dtype : bool, default True
    Try to find better dtype for elementwise function results. If
    False, leave as dtype=object.
args : tuple
    Positional arguments passed to func after the series value.
**kwds
    Additional keyword arguments passed to func.`

In [258]:
google.apply(classify_performance).tail()

3007    Incredible!
3008    Incredible!
3009    Incredible!
3010    Incredible!
3011    Incredible!
Name: Stock Price, dtype: object

In [259]:
google.head(6)

0    50.12
1    54.10
2    54.65
3    52.38
4    52.95
5    53.90
Name: Stock Price, dtype: float64

In [50]:
google.apply(lambda stock_price : stock_price + 1)

0        51.12
1        55.10
2        55.65
3        53.38
4        53.95
         ...  
3007    773.88
3008    772.07
3009    774.18
3010    772.61
3011    783.22
Name: Stock Price, Length: 3012, dtype: float64

## The `.map()` Method

In [51]:
pokemon_names = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
pokemon_names.head(3)

0    Bulbasaur
1      Ivysaur
2     Venusaur
Name: Pokemon, dtype: object

In [52]:
pokemon_types = pd.read_csv("pokemon.csv", index_col = "Pokemon", squeeze = True)
pokemon_types.head(3)

Pokemon
Bulbasaur    Grass
Ivysaur      Grass
Venusaur     Grass
Name: Type, dtype: object

#### The map() method:  Map values of Series according to input correspondence. Used for substituting each value in a Series with another value, that may be derived from a function, a ``dict`` or a :class:`Series`. It returns a new Series with same index as the caller.
`Signature: pokemon_names.map(arg, na_action=None)`

`arg : function, collections.abc.Mapping subclass or Series Mapping correspondence.
na_action : {None, 'ignore'}, default None
    If 'ignore', propagate NaN values, without passing them to the mapping correspondence.`



In [53]:
pokemon_names.map(pokemon_types)

0        Grass
1        Grass
2        Grass
3         Fire
4         Fire
        ...   
716       Dark
717     Dragon
718       Rock
719    Psychic
720       Fire
Name: Pokemon, Length: 721, dtype: object

In [54]:
pokemon_names = pd.read_csv("pokemon.csv", usecols = ["Pokemon"], squeeze = True)
pokemon_types = pd.read_csv("pokemon.csv", index_col = "Pokemon", squeeze = True).to_dict()

In [55]:
pokemon_names.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Pokemon, dtype: object

In [56]:
pokemon_types

{'Bulbasaur': 'Grass',
 'Ivysaur': 'Grass',
 'Venusaur': 'Grass',
 'Charmander': 'Fire',
 'Charmeleon': 'Fire',
 'Charizard': 'Fire',
 'Squirtle': 'Water',
 'Wartortle': 'Water',
 'Blastoise': 'Water',
 'Caterpie': 'Bug',
 'Metapod': 'Bug',
 'Butterfree': 'Bug',
 'Weedle': 'Bug',
 'Kakuna': 'Bug',
 'Beedrill': 'Bug',
 'Pidgey': 'Normal',
 'Pidgeotto': 'Normal',
 'Pidgeot': 'Normal',
 'Rattata': 'Normal',
 'Raticate': 'Normal',
 'Spearow': 'Normal',
 'Fearow': 'Normal',
 'Ekans': 'Poison',
 'Arbok': 'Poison',
 'Pikachu': 'Electric',
 'Raichu': 'Electric',
 'Sandshrew': 'Ground',
 'Sandslash': 'Ground',
 'Nidoran': 'Poison',
 'Nidorina': 'Poison',
 'Nidoqueen': 'Poison',
 'Nidoran♂': 'Poison',
 'Nidorino': 'Poison',
 'Nidoking': 'Poison',
 'Clefairy': 'Fairy',
 'Clefable': 'Fairy',
 'Vulpix': 'Fire',
 'Ninetales': 'Fire',
 'Jigglypuff': 'Normal',
 'Wigglytuff': 'Normal',
 'Zubat': 'Poison',
 'Golbat': 'Poison',
 'Oddish': 'Grass',
 'Gloom': 'Grass',
 'Vileplume': 'Grass',
 'Paras': 'Bug'

In [57]:
pokemon_names.map(pokemon_types)

0        Grass
1        Grass
2        Grass
3         Fire
4         Fire
        ...   
716       Dark
717     Dragon
718       Rock
719    Psychic
720       Fire
Name: Pokemon, Length: 721, dtype: object