# Pandas
<img src="https://miro.medium.com/max/1400/0*1F2u74JQYI8sUuYg" alt="pandas" width="340" style="float: left;"/>

The **pandas** library is built on NumPy and provides easy-to-use **data structures** and **data analysis** tools for the Python programming language.  

![image.png](attachment:image.png)

In [2]:
import pandas as pd

## Series
A **one-dimensional** labeled array capable of holding any data type

![image.png](attachment:image.png)

In [7]:
data = [3, -5, 7, 4]
s = pd.Series(data, index = ['a', 'b', 'c', 'd'])
s

a    3
b   -5
c    7
d    4
dtype: int64

Here, **data** can be many different things:
* a Python dict
* an ndarray
* a scalar value (like 5)

In [8]:
# Series can be instantiated from dicts:
d = {"b": 1, "a": 0, "c": 2}
pd.Series(d)

b    1
a    0
c    2
dtype: int64

## DataFrame
A **two-dimensional** labeled data structure with columns of potentially different types

![image.png](attachment:image.png)

In [6]:
data = {'Country': ['Belgium', 'India', 'Brazil'],
       'Capital': ['Brussels', 'New Delhi', 'Brasilia'],
       'Population': [11190846, 1301371035, 207847528]}
df = pd.DataFrame(data, columns = ['Country', 'Capital', 'Population'])
df

Unnamed: 0,Country,Capital,Population
0,Belgium,Brussels,11190846
1,India,New Delhi,1301371035
2,Brazil,Brasilia,207847528


## I/O

### Read and Write to CSV

In [11]:
pd.read_csv('file.csv', header = None, nrows = 5)

FileNotFoundError: [Errno 2] No such file or directory: 'file.csv'

In [12]:
df.to_csv('myDataFrame.csv')

## Read and Write to Excel

In [13]:
pd.read_excel('file.xlsx')
df.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1')

FileNotFoundError: [Errno 2] No such file or directory: 'file.xlsx'

#### Read multiple sheets from the same file

In [14]:
xlsx = pd.ExcelFile('file.xls')
df = pd.read_excel(xlsx, 'Sheet1')

FileNotFoundError: [Errno 2] No such file or directory: 'file.xls'

## Read and Write to SQL Query or Database Table

## Asking for help

In [17]:
help(pd.Series.loc)

Help on property:

    Access a group of rows and columns by label(s) or a boolean array.
    
    ``.loc[]`` is primarily label based, but may also be used with a
    boolean array.
    
    Allowed inputs are:
    
    - A single label, e.g. ``5`` or ``'a'``, (note that ``5`` is
      interpreted as a *label* of the index, and **never** as an
      integer position along the index).
    - A list or array of labels, e.g. ``['a', 'b', 'c']``.
    - A slice object with labels, e.g. ``'a':'f'``.
    
          start and the stop are included
    
    - A boolean array of the same length as the axis being sliced,
      e.g. ``[True, False, True]``.
    - A ``callable`` function with one argument (the calling Series or
      DataFrame) and that returns valid output for indexing (one of the above)
    
    See more at :ref:`Selection by Label <indexing.label>`
    
    Raises
    ------
    KeyError
        If any items are not found.
    
    See Also
    --------
    DataFrame.at : Access

## Selection

### Getting