<img src ='https://lh3.googleusercontent.com/-zJITVL-36ys/XdtR50hOknI/AAAAAAAAklE/eh54Ao2NWKkfDbSzwQedGrSuaHrRVc2vgCK8BGAsYHg/s0/2019-11-24.png'>

In the previous chapter, 
* We done into detail on NumPy and its ``ndarray`` object, which provides efficient storage and manipulation of dense typed arrays in Python.

* Here we'll build on this knowledge by looking in detail at the data structures provided by the Pandas library.
* Pandas is a newer package built on top of NumPy, and provides an efficient implementation of a ``DataFrame``.
``DataFrame``s are essentially multidimensional arrays with attached row and column labels, and often with heterogeneous types and/or missing data.
* As well as offering a convenient storage interface for labeled data, Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs.


### Features of Pandas
* DataFrame object for data manipulation with integrated indexing.
* Tools for reading and writing data between in-memory data structures and different file formats.
* Data alignment and integrated handling of missing data.
* Reshaping and pivoting of data sets.
* Label-based slicing, fancy indexing, and subsetting of large data sets.
* Data structure column insertion and deletion.
* Group by engine allowing split-apply-combine operations on data sets.
* Data set merging and joining.
* Hierarchical axis indexing to work with high-dimensional data in a lower-dimensional data structure.
* Time series-functionality: Date range generation[4] and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging.
* Provides data filtration.


## Installing and Using Pandas

Installation of Pandas on your system requires NumPy to be installed, and if building the library from source, requires the appropriate tools to compile the C and Cython sources on which Pandas is built.
Details on this installation can be found in the [Pandas documentation](http://pandas.pydata.org/).
If you followed the advice outlined in the [Preface](00.00-Preface.ipynb) and used the Anaconda stack, you already have Pandas installed.

Once Pandas is installed, you can import it and check the version:

In [1]:
!pip install pandas



In [2]:
!pip install pandas-profiling[notebook,html]   







In [3]:
import pandas
pandas.__version__

'1.0.3'

Just as we generally import NumPy under the alias ``np``, we will import Pandas under the alias ``pd``:

In [4]:
import pandas as pd

In [5]:
pd.__version__

'1.0.3'

This import convention will be used throughout the remainder of this book.

In [6]:
import pandas_profiling as pp  # Generating the Report 

In [7]:
pp.__version__

'2.8.0'

In [None]:
pd.

## Reminder about Built-In Documentation

As you read through this chapter, don't forget that IPython gives you the ability to quickly explore the contents of a package (by using the tab-completion feature) as well as the documentation of various functions (using the ``?`` character). (Refer back to [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb) if you need a refresher on this.)

For example, to display all the contents of the pandas namespace, you can type

```ipython
In [3]: pd.<TAB>
```

And to display Pandas's built-in documentation, you can use this:

```ipython
In [4]: pd?
```

More detailed documentation, along with tutorials and other resources, can be found at http://pandas.pydata.org/.



## Data Science Life Cycle 

<img src='https://lh3.googleusercontent.com/-aOkQ7qExbEY/XdtGI_BFQ8I/AAAAAAAAkk4/Nyo1dzF0ByEFzDS5CQaV4ODCmBHOYB78ACK8BGAsYHg/s0/2019-11-24.png'/>