## 298. Introduction
- NumPy is a great library to work with homogeneous numeric data, which uses integer based indexing, but it is not a great library to handle the Big Data today
- Big Data today needs a data structures that can be easily customized
- Big Data comes in mixed types and can have missing data to be handled
- Also, we need various functions, mathematical operations that need to be applied to Big Data, that is where `Pandas` come in
- The word `Panda` is derived from `Panel Data`
- Examples are stock prices, players' scores across matches, students' grades across exams, and so on
        NumPy                   Pandas
        Numeric                 Custom
        Integer Indexing        Mixed/Missing
                                Manipulation
- Pandas are classified into
    1. `Series`
        - to handle one-dimensional data
    2. `DataFrames`
        - to handle two-dimensional data
- Pandas use arrays behind the scenes, and they're very closely related to NumPy library
- Several NumPy library functions acts up `Series` and `DataFrames` as functional arguments, so that you can use Pandas with NumPy libaray as well
- Both `Series` and `DataFrames` will allow us to easily select and manipulate the data
- we can apply functions like `map` `reduce` right out of the box
- we can perform various mathematical operations on Big Data
- Also, we can visualize the data in differen formats
- All this is in-built into `Pandas`

## 299. Series
- A `Series` is an enhanced one-dimensional array
- while arrays use `zero-based indexing` which is numeric, `Series` support `custom indexing` like strings
- `Series` also handle missing data, as many functions in `NumPy` ignore the missing data
- we can create a `Series` using a `list`, `numpy.ndarray`, `map`, etc.
- The default index in `Series` is a numeric value which starts from zero, but we can customize it
- We're going to create several Series of your own and explore different functions on `Series` like `count()`, `mean()`, `min()`, `max()`, `std()`, `describe()`, and more

## 300. Create Project
- To install `Pandas` from the commandline, you've to execute
``` python
pip3 install pandas
```
- This will install `Pandas` for your Python environment

In [1]:
# pandas

## 301. Create and use Series
- We'll start exporing Pandas Series
- `pandas.Series(data=None, index=None, dtype='Dtype|None'=None, name=None, copy='bool|None'=None, fastpath='bool|lib:NoDefault'=<no_default>, )`
    - One-dimensional ndarray with axis-labels (including time series)
    - `data` : array-like, Iterable, dict, or scalar value
        - Contains data stored in Series. If data is a dict, argument order is maintained.
    - `index` : array-like or Index (1d)
        - Values must be hashable and have the same length as `data`.
        - Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, ..., n) if not provided.
        - If data is dict-like and index is None, then the keys in the data are used as the index.
        - If the index is not None, the resulting Series is reindexed with the index values.
    - `dtype `: str, numpy.dtype, or ExtensionDtype, optional
        - Data type for the output Series. If not specified, this will be inferred from `data`.
    - `name` : Hashable, default None
        - The name to give to the Series.
    - `copy` : bool, default False
        - Copy input data. Only affects Series or 1d ndarray input
- `pandas.Series.count()`
    - return the number of Non-NA/null observations in the Series
- `pandas.Series.mean(axis:'Axis|None'=0, skipna:'bool'=True, numeric_only:'bool'=False, **kwargs, )`
    - return the mean of the values over the requested axis
- `pandas.Series.min(axis:'Axis|None'=0, skipna:'bool'=True, numeric_only:'bool'=False, **kwargs, )`
    - return the minimum of the values over the requested axis
- `pandas.Series.max(axis:'Axis|None'=0, skipna:'bool'=True, numeric_only:'bool'=False, **kwargs, )`
    - return the maximum of the values over the requested axis
- `pandas.Series.std(axis:'Axis|None'=0, skipna:'bool'=True, ddof:'int'=1,  numeric_only:'bool'=False, **kwargs, )`
    - return sample standard deviation over the requested axis
    - Normalized by N-1 by default. This can be changed using the ddof argument.


In [2]:
# pandas
# series_demo.py
import pandas as pd

reviews = pd.Series([4.6, 4.4, 4.8, 5])
print(reviews) # 1st col is index starting from 0,and 2nd col is data
print("reviews[0]:", reviews[0]) # accessing Series element using index

0    4.6
1    4.4
2    4.8
3    5.0
dtype: float64
reviews[0]: 4.6


In [3]:
print("reviews.count():", reviews.count()) # count of non-null elements in Series
print("reviews.mean():", reviews.mean()) # mean of non-null elements in Series
print("reviews.min():", reviews.min()) # min of non-null elements in Series
print("reviews.max():", reviews.max()) # max of non-null elements in Series
print("reviews.std():", reviews.std()) # sample standard deviation of non-null elements in Series

reviews.count(): 4
reviews.mean(): 4.7
reviews.min(): 4.4
reviews.max(): 5.0
reviews.std(): 0.25819888974716104


## 302. Use Custom indices
- Previously, you've seen that the Pandas Series generates an default index index that starts with 0 and goes till length-1
- Instead of default index, we can use custom index as well
-  we can also use a dict to initialize a Series
- Instead of defining a Series using a list and then passing the index, you can also initialize a Series using a dict where keys will become indices and the values will be the values in the Series
- `pandas.Series.values`
    - returns ndarray of values only for the Series
- `pandas.Series.index`
    - returns Immutable sequence used for indexing and alignment

In [4]:
reviews = pd.Series([4.6, 4.4, 4.8, 5], index=['python', 'java', 'django', 'devops'])
print(reviews)

reviews = pd.Series({'python': 4.6, 'java':4.4, 'django':4.8, 'devops':5}) # keys will become the indices
print(reviews)

python    4.6
java      4.4
django    4.8
devops    5.0
dtype: float64
python    4.6
java      4.4
django    4.8
devops    5.0
dtype: float64


In [5]:
print("reviews['python']:", reviews['python']) # access Series elements using custom index
print("reviews.python:", reviews.python) # access Series element using dot operator
print("reviews.java:", reviews.java)
print("reviews.django:", reviews.django)

reviews['python']: 4.6
reviews.python: 4.6
reviews.java: 4.4
reviews.django: 4.8


In [6]:
print(reviews.values) # returns an ndarray with all the values only of Series
print(reviews.index) # returns an immutable sequence used for indexing & alignment

[4.6 4.4 4.8 5. ]
Index(['python', 'java', 'django', 'devops'], dtype='object')


## 303. Series of String
-  You'll learn how to use String type data within your Pandas Series
- Strings are stored as `object` dtype in pandas
- `pandas.Series.str.upper()`
    - Convert strings in the Series/Index to uppercase.
    - Equivalent to :meth:`str.upper`.
    - returns Series or Index of object
    - returns NaN for non-string elements
- `courses.str.contains(pat, case: 'bool' = True, flags: 'int' = 0, na=None, regex: 'bool' = True,)`
    - Test if pattern or regex is contained within a string of a Series or Index.
    - return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index
    - returns True if it is present, and returns False, if it is not present

In [7]:
courses = pd.Series(['Java', 'Python', 'AWS'])
print(courses)
print(courses.str.upper())
print(courses.str.contains('y'))

0      Java
1    Python
2       AWS
dtype: object
0      JAVA
1    PYTHON
2       AWS
dtype: object
0    False
1     True
2    False
dtype: bool


## 304. Describe
- We'll introduce you a method `describe()` on series
- `pandas.Series.describe(percentiles=None, include=None, exclude=None)`
    -  returns descriptive statitics for the given series of data
    - it'll ignore the `NaN` values
    - it also gives percentiles
    - by default, pandas uses precision of 6 decimals
    - `Percentile`
        - the percentile rank of a value tells us the percentage of values in a dataset that rank equal to or below agiven value
        - `25th Percentile` :
            - also known as the first, or lower quartile
            - The 25th percentile is the value at which 25% of the answers lie below that value, and 75% of the answers lie above that value

In [8]:
reviews = pd.Series([4.6, 4.4, 4.8, 5])
print(reviews)
print("reviews.describe():\n", reviews.describe())
# only value 4.4 lie below the 25th percentile which is 4.55
# 50% of data is below/above 50th percentile which is 4.70

0    4.6
1    4.4
2    4.8
3    5.0
dtype: float64
reviews.describe():
 count    4.000000
mean     4.700000
std      0.258199
min      4.400000
25%      4.550000
50%      4.700000
75%      4.850000
max      5.000000
dtype: float64


## 305. DataFrame
- DataFrame is an imporved two-dimensional array
- They allow custom row and column indexing
- They have various operations required for data science projects
- Each column in a DataFrame is a Series
- This is a DataFrame of Cricket Players across matches
            Kohli  Rohit  Surya  Jadeja
        I1    100    100     77      99
        I2     50     88    110     120
        I3     70      0      0       8
- Each row has different player scores across matches
- DataFrames also handle the missing data, just like Series

## 306. Create DataFrame
- We'll start using `DataFrames`
- It has index starting from 0 and goes till length-1
- Each key in the dict has become a column in DataFrame
- We have scored for each player in respective columns
- We have three rows, representing three scores of each player
- You can also create custom index in `DataFrames` by passing an additional parameter `index`
- The number of items in index should be same as number of rows, otherwise You'll get a `ValueError`
- `pd.DataFrame(data=None, index: 'Axes | None' = None, columns: 'Axes | None' = None, dtype: 'Dtype | None' = None, copy: 'bool | None' = None, )`
    - Two-dimensional, size-mutable, potentially heterogeneous tabular data.
    - Data structure also contains labeled axes (rows and columns).
    - Arithmetic operations align on both row and column labels.
    - Can be thought of as a dict-like container for Series objects.
    - The primary pandas data structure.
    - `data` : ndarray (structured or homogeneous), Iterable, dict, or DataFrame
        - Dict can contain Series, arrays, constants, dataclass or list-like objects.
        - If data is a dict, column order follows insertion-order.
        - If a dict contains Series which have an index defined, it is aligned by its index.
        - This alignment also occurs if data is a Series or a DataFrame itself.
        - Alignment is done on Series/DataFrame inputs.
        - If data is a list of dicts, column order follows insertion-order.
    - `index` : Index or array-like
        - Index to use for resulting frame.
        - Will default to `RangeIndex` if no indexing information part of input data and no index provided.
    - `columns` : Index or array-like
        - Column labels to use for resulting frame when data does not have them, defaulting to `RangeIndex(0, 1, 2, ..., n)`.
        - If data contains column labels, will perform column selection instead.
    - `dtype` : dtype, default None
        - Data type to force.
        - Only a single dtype is allowed.
        - If None, infer.
    - `copy` : bool or None, default None
        - Copy data from inputs.
        - For dict data, the default of None behaves like ``copy=True``.  For DataFrame or 2d ndarray input, the default of None behaves like ``copy=False``.
        - If data is a dict containing one or more Series (possibly of different dtypes), ``copy=False`` will ensure that these inputs are not copied.
- `pandas.DataFrame.index`
    - an attribute used to set labels/ custom index to a DataFrame
    - The index (row labels) of the DataFrame.
    - The index of a DataFrame is a series of labels that identify each row.
    - The labels can be integers, strings, or any other hashable type.
    - The index is used for label-based access and alignment, and can be accessed or modified using this attribute.


In [9]:
# dataframes_demo.py
import pandas as pd

scores_dict = {'Kohli': [100, 50, 70], 'Rohit':[100, 88, 0], \
               'Surya':[77, 110, 0], 'Jadeja':[99, 120, 8]}
scores = pd.DataFrame(scores_dict) # creating DataFrame from dict
print(scores)

   Kohli  Rohit  Surya  Jadeja
0    100    100     77      99
1     50     88    110     120
2     70      0      0       8


In [10]:
import pandas as pd

scores_dict = {'Kohli': [100, 50, 70], 'Rohit':[100, 88, 0], \
               'Surya':[77, 110, 0], 'Jadeja':[99, 120, 8]}
# scores = pd.DataFrame(scores_dict) # creating DataFrame from dict
scores = pd.DataFrame(scores_dict, index=['I1', 'I2', 'I3']) # setting custom index
print(scores)

    Kohli  Rohit  Surya  Jadeja
I1    100    100     77      99
I2     50     88    110     120
I3     70      0      0       8


In [11]:
scores_dict = {'Kohli': [100, 50, 70], 'Rohit':[100, 88, 0], \
               'Surya':[77, 110, 0], 'Jadeja':[99, 120, 8]}
scores = pd.DataFrame(scores_dict) # creating DataFrame from dict
scores.index = ['I1', 'I2', 'I3'] # using index attribute of DataFrame to set custom index
print(scores)

    Kohli  Rohit  Surya  Jadeja
I1    100    100     77      99
I2     50     88    110     120
I3     70      0      0       8


## 307. Access Columns and Rows
- You'll learn how to access the column-data and rows within a DataFrame
- Each column in a `DataFrame` is a `Series`
- `pandas.DataFrame.loc['index']`
    - takes a string based custom index (not number based default index)
    - Access a group of rows and columns by label(s) or a boolean array.
    - ``.loc[]`` is primarily label based, but may also be used with a
    - it raises
        - `KeyError`
            - If any items are not found.
        - `IndexingError`
            - If an indexed key is passed and its index is unalignable to the frame index.
boolean array.
- `pandas.DataFrame.iloc[index]`
    - takes a integer based default index (not string based custom index)
    - Purely integer-location based indexing for selection by position.
    - ``.iloc[]`` is primarily integer position based (from ``0`` to ``length-1`` of the axis), but may also be used with a boolean array.
    - ``.iloc`` will raise ``IndexError`` if a requested indexer is out-of-bounds, except *slice* indexers which allow out-of-bounds indexing (this conforms with python/numpy *slice* semantics).


In [12]:
scores_dict = {'Kohli': [100, 50, 70], 'Rohit':[100, 88, 0], \
               'Surya':[77, 110, 0], 'Jadeja':[99, 120, 8]}
scores = pd.DataFrame(scores_dict) # creating DataFrame from dict
scores.index = ['I1', 'I2', 'I3'] # using index attribute of DataFrame to set custom index
print("scores:\n", scores)

scores:
     Kohli  Rohit  Surya  Jadeja
I1    100    100     77      99
I2     50     88    110     120
I3     70      0      0       8


In [13]:
# to access column data
print("Kolhi's Scores")
print("scores['Kohli']:\n", scores['Kohli']) # returns the column as Series whose column name is specified
print("scores.Kohli:\n", scores.Kohli) # returns the column by using column name as an attribute, unless it is a valid string based identifier

Kolhi's Scores
scores['Kohli']:
 I1    100
I2     50
I3     70
Name: Kohli, dtype: int64
scores.Kohli:
 I1    100
I2     50
I3     70
Name: Kohli, dtype: int64


In [14]:
# to access row data
print("scores.loc['I1']:\n", scores.loc['I1']) # returns a row as a Series whose index is specified as custom string index
print("scores.iloc[0]:\n", scores.iloc[0]) # returns a row as a Series whose index is specified as default integer index

scores.loc['I1']:
 Kohli     100
Rohit     100
Surya      77
Jadeja     99
Name: I1, dtype: int64
scores.iloc[0]:
 Kohli     100
Rohit     100
Surya      77
Jadeja     99
Name: I1, dtype: int64


## 308. Use slicing and Lists
- `loc[]` and `iloc[]` methods allow us to do slicing and to select specific rows by passing in a list
- `Note`: while we use string based index in `loc`, it is inclusive, unlike traditional slicing
    - But, while using integer based index in `iloc`, it is not inclusive, it excludes hte ending
- Key difference between string based indexing and integer based indexing is that string indices are inclusive and integer based index exclude the last element
- we can also select specific rows by passing in a list to these methods `loc` and `iloc`

In [15]:
# to access row data
# print("scores.loc['I1']:\n", scores.loc['I1']) # returns a row as a Series whose index is specified as custom string index
# print("scores.iloc[0]:\n", scores.iloc[0]) # returns a row as a Series whose index is specified as default integer index

print(scores.loc['I1':'I3']) # slicing using string based index - inclusive
print(scores.iloc[0:2]) # slicling using integer based index - excludes last element

print(scores.loc['I1':]) # slicing using string based index till end
print(scores.iloc[0:]) # slicing using integer based index till end

print(scores.loc[['I1', 'I3']]) # using list of index as parameter to loc to select specific rows
print(scores.iloc[[0, 1]]) # using list of index as parameter to loc to select specific rows

    Kohli  Rohit  Surya  Jadeja
I1    100    100     77      99
I2     50     88    110     120
I3     70      0      0       8
    Kohli  Rohit  Surya  Jadeja
I1    100    100     77      99
I2     50     88    110     120
    Kohli  Rohit  Surya  Jadeja
I1    100    100     77      99
I2     50     88    110     120
I3     70      0      0       8
    Kohli  Rohit  Surya  Jadeja
I1    100    100     77      99
I2     50     88    110     120
I3     70      0      0       8
    Kohli  Rohit  Surya  Jadeja
I1    100    100     77      99
I3     70      0      0       8
    Kohli  Rohit  Surya  Jadeja
I1    100    100     77      99
I2     50     88    110     120


## 309. Getting a subset
- You'll learn how to select the subset of rows & columns, instead of getting all the rows and all the columns
- You can combine the slicing and listing syntax and retrieve a subset of rows and columns


In [16]:
print(scores)
print(scores.loc['I1':'I2', ['Kohli', 'Surya']]) # getting subset of scores for only Kohli and Surya
print(scores.iloc[[0, 2], 0:3]) # getting subset of scores rows 0 & 2, and cols 0 to 2

# you select specific [rows, cols] with loc and iloc
print(scores.loc[['I1', 'I3'], ['Kohli', 'Surya']]) # using list of index as parameter to loc to select specific [rowsList, colsList]
print(scores.iloc[[0, 1], [0, 3]]) # using list of index as parameter to loc to select specific [rowsList, colsList]

    Kohli  Rohit  Surya  Jadeja
I1    100    100     77      99
I2     50     88    110     120
I3     70      0      0       8
    Kohli  Surya
I1    100     77
I2     50    110
    Kohli  Rohit  Surya
I1    100    100     77
I3     70      0      0
    Kohli  Surya
I1    100     77
I3     70      0
    Kohli  Jadeja
I1    100      99
I2     50     120


## 310. Query for data
- You'll learn how to use boolean indexing, or pass a criteria based on which data will be selected
- You can pass criteria as index to DataFrame, which will display only values satisfying the criteria otherwise it'll display `NaN`
- `Boolean Indexing` : passing a criteria to DataFrame based on which data will be selected

In [17]:
print(scores)
print(scores[scores>90]) # displays only thr values where the criteria is True otherwise displays NaN
print(scores[(scores>=80) & (scores<=90)]) # passing combination of conditons

    Kohli  Rohit  Surya  Jadeja
I1    100    100     77      99
I2     50     88    110     120
I3     70      0      0       8
    Kohli  Rohit  Surya  Jadeja
I1  100.0  100.0    NaN    99.0
I2    NaN    NaN  110.0   120.0
I3    NaN    NaN    NaN     NaN
    Kohli  Rohit  Surya  Jadeja
I1    NaN    NaN    NaN     NaN
I2    NaN   88.0    NaN     NaN
I3    NaN    NaN    NaN     NaN


## 311. Pick a Cell
- You'll learn how to access data in a particular cell in a DataFrame using `at` and `iat`
- You can also assign values at the cells referred using `at` or `iat`to overwrite current data
- `pandas.DataFrame.at[rowLabel, colLabel]`
    - access a single value for a row/column label pair.
    - Similar to ``loc``, in that both provide label-based lookups.
    - Use ``at`` if you only need to get or set a single value in a DataFrame or Series.
    - raises
        - `KeyError`
            - If getting a value and 'label' does not exist in a DataFrame or Series.
        - `ValueError`
            - If row/column label pair is not a tuple or if any label from the pair is not a scalar for DataFrame.
            - If label is list-like (*excluding* NamedTuple) for Series.
- `pandas.DataFrame.iat[rowInteger, colInteger]`
    - Access a single value for a row/column pair by integer position.
    - Similar to ``iloc``, in that both provide integer-based lookups.
    - Use ``iat`` if you only need to get or set a single value in a DataFrame or Series.
    - raises `IndexError` : When integer position is out of bounds.

In [18]:
print(scores)

print("scores.at['I2', 'Kohli']:", scores.at['I2', 'Kohli']) # returns only single value at specified row, col using string/label index
print("scores.iat[2, 0]:", scores.iat[2, 0]) # returns only single value at specified row. col using integer index

scores.at['I2', 'Kohli'] = 150 # update a value using `at` attribute
print("Updated scores.at['I2', 'Kohli']:", scores.at['I2', 'Kohli'])

scores.iat[2, 0] = 200 # update a value using `iat` attribute
print("Updated scores.iat[2, 0]:", scores.iat[2, 0])


    Kohli  Rohit  Surya  Jadeja
I1    100    100     77      99
I2     50     88    110     120
I3     70      0      0       8
scores.at['I2', 'Kohli']: 50
scores.iat[2, 0]: 70
Updated scores.at['I2', 'Kohli']: 150
Updated scores.iat[2, 0]: 200


## 312. Describing and Transposing
- You'll learn how to get some statistics out of our DataFrame
- By default, pandas uses precision of 6 decimals
- But you can change it using `pandas.set_option()`
- `pandas.DataFrame.describe()`
    - returns descriptive statistics for a DataFrame
- `pandas.DataFrame.T`
    - returns : DataFrame, The transposed DataFrame.

In [19]:
print(scores)
print(scores.mean()) # returns mean for each column in DataFrame
print(scores.describe()) # returns descriptive statistics for seach column

    Kohli  Rohit  Surya  Jadeja
I1    100    100     77      99
I2    150     88    110     120
I3    200      0      0       8
Kohli     150.000000
Rohit      62.666667
Surya      62.333333
Jadeja     75.666667
dtype: float64
       Kohli       Rohit       Surya      Jadeja
count    3.0    3.000000    3.000000    3.000000
mean   150.0   62.666667   62.333333   75.666667
std     50.0   54.601587   56.447616   59.534304
min    100.0    0.000000    0.000000    8.000000
25%    125.0   44.000000   38.500000   53.500000
50%    150.0   88.000000   77.000000   99.000000
75%    175.0   94.000000   93.500000  109.500000
max    200.0  100.000000  110.000000  120.000000


In [20]:
pd.set_option('display.precision', 2) # sets precision to 2 decimals
print(scores.describe()) # print descriptive statistics

       Kohli   Rohit   Surya  Jadeja
count    3.0    3.00    3.00    3.00
mean   150.0   62.67   62.33   75.67
std     50.0   54.60   56.45   59.53
min    100.0    0.00    0.00    8.00
25%    125.0   44.00   38.50   53.50
50%    150.0   88.00   77.00   99.00
75%    175.0   94.00   93.50  109.50
max    200.0  100.00  110.00  120.00


In [21]:
print(scores.T) # transpose of a DataFrame

         I1   I2   I3
Kohli   100  150  200
Rohit   100   88    0
Surya    77  110    0
Jadeja   99  120    8


In [22]:
print(scores.T.describe()) # returns descriptive statistics on the transposed DataFrame

           I1      I2      I3
count    4.00    4.00    4.00
mean    94.00  117.00   52.00
std     11.34   25.74   98.74
min     77.00   88.00    0.00
25%     93.50  104.50    0.00
50%     99.50  115.00    4.00
75%    100.00  127.50   56.00
max    100.00  150.00  200.00


## 313. Sorting
- You'll learn how to sort the data in DataFrames using col index, row index, and also the values
- `pandas.DataFrame.sort_index(*, axis: 'Axis' = 0, level: 'IndexLabel | None' = None, ascending: 'bool | Sequence[bool]' = True, inplace: 'bool' = False, kind: 'SortKind' = 'quicksort', na_position: 'NaPosition' = 'last', sort_remaining: 'bool' = True, ignore_index: 'bool' = False, key: 'IndexKeyFunc | None' = None, )`
    - sorts with row-index by default in ascending order, can also sort with column
    - Sort object by labels (along an axis).
    - The original DataFrame sorted by the labels or None if ``inplace=True``
    - `axis` : {0 or 'index', 1 or 'columns'}, default 0
        - The axis along which to sort.
        - The value 0 identifies the rows, and 1 identifies the columns.
    - `level` : int or level name or list of ints or list of level names
        - If not None, sort on values in specified index level(s).
    - `ascending` : bool or list-like of bools, default True
        - Sort ascending vs. descending.
        - When the index is a MultiIndex the sort direction can be controlled for each level individually.
    - `inplace` : bool, default False
        - Whether to modify the DataFrame rather than creating a new one.
    - `kind` : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'
        - Choice of sorting algorithm.
        - See also :func:`numpy.sort` for more information.
        - `mergesort` and `stable` are the only stable algorithms.
        - For DataFrames, this option is only applied when sorting on a single column or label.
    - `na_position` : {'first', 'last'}, default 'last'
        - Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.
        - Not implemented for MultiIndex.
    - `sort_remaining` : bool, default True
        - If True and sorting by level and index is multilevel, sort by other levels too (in order) after sorting by specified level.
    - `ignore_index` : bool, default False
        - If True, the resulting axis will be labeled 0, 1, …, n - 1.
    - `key` : callable, optional
        - If not None, apply the key function to the index values before sorting.
        - This is similar to the `key` argument in the builtin :meth:`sorted` function, with the notable difference that this `key` function should be *vectorized*.
        - It should expect an ``Index`` and return an ``Index`` of the same shape. For MultiIndex inputs, the key is applied *per level*.
- `pandas.DataFrame.sort_values(by: 'IndexLabel', *, axis: 'Axis' = 0, ascending: 'bool | list[bool] | tuple[bool, ...]' = True, inplace: 'bool' = False, kind: 'SortKind' = 'quicksort', na_position: 'str' = 'last', ignore_index: 'bool' = False, key: 'ValueKeyFunc | None' = None, )`
    - sorts valus in row/col as per specified axis of col/row
    - by default, sorts in ascending order
    - returns DataFrame with sorted values or None if ``inplace=True``
    - `by` : str or list of str
        - Name or list of names to sort by.
        - if `axis` is 0 or `'index'` then `by` may contain index levels and/or column labels.
        - if `axis` is 1 or `'columns'` then `by` may contain column levels and or index labels.
    - `axis` : "{0 or 'index', 1 or 'columns'}", default 0
        - Axis to be sorted.
    - `ascending` : bool or list of bool, default True
        - Sort ascending vs. descending. Specify list for multiple sort orders.
        - If this is a list of bools, must match the length of the by.
    - `inplace` : bool, default False
        - If True, perform operation in-place.
    - `kind` : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'
        - Choice of sorting algorithm. See also :func:`numpy.sort` for more information.
        - `mergesort` and `stable` are the only stable algorithms.
        - For DataFrames, this option is only applied when sorting on a single column or label.
    - `na_position` : {'first', 'last'}, default 'last'
        - Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.
    - `ignore_index` : bool, default False
        - If True, the resulting axis will be labeled 0, 1, …, n - 1.
    - `key` : callable, optional
        - Apply the key function to the values before sorting. This is similar to the `key` argument in the `builtin` :meth:`sorted` function, with the notable difference that this `key` function should be *vectorized*.
        - It should expect a ``Series`` and return a Series with the same shape as the input.
        - It will be applied to each column in `by` independently.



In [23]:
print(scores)
print(scores.sort_index())

    Kohli  Rohit  Surya  Jadeja
I1    100    100     77      99
I2    150     88    110     120
I3    200      0      0       8
    Kohli  Rohit  Surya  Jadeja
I1    100    100     77      99
I2    150     88    110     120
I3    200      0      0       8


In [24]:
print(scores.sort_index(ascending=False)) # sorts rows/index in descending order

    Kohli  Rohit  Surya  Jadeja
I3    200      0      0       8
I2    150     88    110     120
I1    100    100     77      99


In [25]:
print(scores.sort_index(axis=1, ascending=False)) # sorts column names in descending order

    Surya  Rohit  Kohli  Jadeja
I1     77    100    100      99
I2    110     88    150     120
I3      0      0    200       8


In [26]:
print(scores.sort_index(axis=1)) # sorts column names in ascending order

    Jadeja  Kohli  Rohit  Surya
I1      99    100    100     77
I2     120    150     88    110
I3       8    200      0      0


In [27]:
print(scores.sort_values(by='I1', axis=1)) # to sort values in row I1 by column

    Surya  Jadeja  Kohli  Rohit
I1     77      99    100    100
I2    110     120    150     88
I3      0       8    200      0


In [28]:
print(scores.sort_values(by='I1', axis=1, ascending=False)) # to sort values in row I1 by column in descending order

    Kohli  Rohit  Jadeja  Surya
I1    100    100      99     77
I2    150     88     120    110
I3    200      0       8      0


## 314. Working with CSV files
- One of the popular file formats used in Big Data and data science is `.csv` format
- `CSV` stands for "Comma Separated Values" or "Character Separated Values"
- First row in csv file indicates, the column names or meta-data of the data that will follow, and rest of the lines have the data
- Each line has the values that are separated by commas
- pandas make it super-easy to load and work with the csv data
- we can create a `DataFrame` by simply using `pandas.read_csv()` method, provide the path to the csv file and the data will be loaded into the DataFrame using which you can do all the operations you've learnt so far
- You can also easily write a DataFrame using `pandas.DataFrame.to_csv()` method
        first_name,last_name,email,phone
        john,feguson,john@feguson.com,1234567890
        doug,bailey,doug@bailey.com,1234567891
        bob,palmer,bob@palmer.com,1234567892
        ram,sundar,ram@sundar.com,1234567893

## 315. Read Data
-