# Data analysis
<b>Last updated : 15th August 2021</b>

## <a href = "https://pandas.pydata.org/">About Pandas</a>

<b>Pandas</b> aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. <br>
Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

### History of development

In 2008, pandas development began at AQR Capital Management. Since 2015, pandas is a NumFOCUS sponsored project. This will help ensure the success of development of pandas as a world-class open-source project.
#### Timeline
- <b>2008</b>: Development of <i>pandas</i> started
- <b>2009</b>: <i>pandas</i> becomes open source
- <b>2012</b>: First eidition of <i>Python for Data Analysis</i> is published
- <b>2015</b>: <i>pandas</i> becomes NumFOCUS sponsered project
- <b>2018</b>: First in-person core developer sprint

### <a href = "https://pandas.pydata.org/docs/getting_started/overview.html">Library Highlights</a>

- A fast and efficient <b>DataFrame</b> object for data manipulation with integrated indexing;
- Tools for <b>reading and writing data</b> between in-memory data structures and different formats: CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 format;
- Intelligent <b>data alignment</b> and integrated handling of <b>missing data</b>: gain automatic label-based alignment in computations and easily manipulate messy data into an orderly form;
- Flexible <b>reshaping</b> and pivoting of data sets;
- Intelligent label-based <b>slicing</b>, <b>fancy indexing</b>, and <b>subsetting</b> of large data sets;
- Columns can be inserted and deleted from data structures for <b>size mutability</b>;
- Aggregating or transforming data with a powerful <b>group by</b> engine allowing split-apply-combine operations on data sets;
- High performance <b>merging and joining</b> of data sets;
- <b>Hierarchical axis indexing</b> provides an intuitive way of working with high-dimensional data in a lower-dimensional data structure;
- <b>Time series</b>-functionality: date range generation and frequency conversion, moving window statistics, date shifting and lagging. Even create domain-specific time offsets and join time series without losing data;
- Highly <b>optimized for performance</b>, with critical code paths written in Cython or C.
- Python with pandas is in use in a wide variety of <b>academic and commercial</b> domains, including Finance, Neuroscience, Economics, Statistics, Advertising, Web Analytics, and more.

### <a href = "https://pandas.pydata.org/docs/getting_started/install.html">Installation</a>

In [None]:
# Pandas can be installed in a couple of ways, depending on purpose.
# # Using pip
# $ pip install pandas

# # Or using conda
# $ conda install pandas
# $ conda install pandas=0.20.3  # To install a specific pandas version

# # Ubuntu
# sudo apt-get install python3-pandas

# # Centos/RHEL
# yum install python3-pandas

In [2]:
import pandas
pandas.__version__

'1.2.4'

## <a href = "https://pandas.pydata.org/docs/getting_started/index.html#getting-started">Getting started</a>
New to pandas? Check out the getting started guides. They contain an introduction to pandas’ main concepts and links to additional tutorials.


### Data structures
Dimensions, Name, Description <br>
1, Series, 1D labeled homogeneously-typed aaray <br>
2, DataFrame, General 2D labeled, size-mutable tabular structure with potentially heterogeneously-typed column

In [None]:
for col in df.columns:
    series = df[col]
    # do something with series

### <a href = "https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html">Getting started tutorials</a>
- <a href = "https://pandas.pydata.org/docs/getting_started/intro_tutorials/01_table_oriented.html">What kind of data does pandas handle?</a>
- <a href = "https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html">How do I read asnd write tabular data?</a>
- <a href = "https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html">How do I select a subset of a <b>DataFrame</b></a>
- <a href = "https://pandas.pydata.org/docs/getting_started/intro_tutorials/04_plotting.html">How to create plots in pandas?</a>
- <a href = "https://pandas.pydata.org/docs/getting_started/intro_tutorials/05_add_columns.html">How to create new columns derived from existing columns?</a>
- <a href = "https://pandas.pydata.org/docs/getting_started/intro_tutorials/06_calculate_statistics.html">How to calculate summary statistics?</a>
- <a href = "https://pandas.pydata.org/docs/getting_started/intro_tutorials/07_reshape_table_layout.html">How to reshape the layout of tables?</a>
- <a href = "https://pandas.pydata.org/docs/getting_started/intro_tutorials/08_combine_dataframes.html">How to combine data from multiple tables?</a>
- <a href = "https://pandas.pydata.org/docs/getting_started/intro_tutorials/09_timeseries.html">How to handle time series data with ease?</a>
- <a href = "https://pandas.pydata.org/docs/getting_started/intro_tutorials/10_text_data.html">How to manipulate textual data?</a>

## <a href = "https://pandas.pydata.org/docs/user_guide/index.html#user-guide">User guide</a>
The user guide provides in-depth information on the key concepts of pandas with useful background information and explanation.
- <a href = "https://pandas.pydata.org/docs/user_guide/10min.html">10 minutes to pandas</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/dsintro.html">Intro to data structures</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/basics.html">Essential basic functionality</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/io.html">IO tools (text, CSV, HDF5, …)</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/indexing.html">Indexing and selecting data</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/advanced.html">MultiIndex / advanced indexing</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/merging.html">Merge, join, concatenate and compare</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/reshaping.html">Reshaping and pivot tables</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/text.html">Working with text data</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/missing_data.html">Working with missing data</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/duplicates.html">Duplicate Labels</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/categorical.html">Categorical data</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/integer_na.html">Nullable integer data type</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/visualization.html">Chart Visualization</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/style.html">Table Visualization</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/computation.html">Computational tools</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/groupby.html">Group by: split-apply-combine</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/window.html">Windowing Operations</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/timeseries.html">Time series / date functionality</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/timedeltas.html">Time deltas</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/options.html">Options and settings</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/enhancingperf.html">Enhancing performance</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/scale.html">Scaling to large datasets</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/sparse.html">Sparse data structures</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/gotchas.html">Frequently Asked Questions (FAQ</a>
- <a href = "https://pandas.pydata.org/docs/user_guide/cookbook.html">Cookbook</a>

## <a href = "https://pandas.pydata.org/docs/reference/index.html#api">API Reference</a>
The reference guide contains a detailed description of the pandas API. The reference describes how the methods work and which parameters can be used. It assumes that you have an understanding of the key concepts.
- <a href = "https://pandas.pydata.org/docs/reference/io.html">Input/output</a>
- <a href = "https://pandas.pydata.org/docs/reference/general_functions.html">General functions</a>
- <a href = "https://pandas.pydata.org/docs/reference/series.html">Series</a>
- <a href = "https://pandas.pydata.org/docs/reference/frame.html">DataFrame</a>
- <a href = "https://pandas.pydata.org/docs/reference/arrays.html">pandas arrays</a>
- <a href = "https://pandas.pydata.org/docs/reference/indexing.html">Index objects</a>
- <a href = "https://pandas.pydata.org/docs/reference/offset_frequency.html">Date offsets</a>
- <a href = "https://pandas.pydata.org/docs/reference/window.html">Window</a>
- <a href = "https://pandas.pydata.org/docs/reference/groupby.html">GroupBy</a>
- <a href = "https://pandas.pydata.org/docs/reference/resampling.html">Resampling</a>
- <a href = "https://pandas.pydata.org/docs/reference/style.html">Style</a>
- <a href = "https://pandas.pydata.org/docs/reference/plotting.html">Plotting</a>
- <a href = "https://pandas.pydata.org/docs/reference/general_utility_functions.html">General utility functions</a>
- <a href = "https://pandas.pydata.org/docs/reference/extensions.html">Extensions</a>

## Reference
### External Resource
- <a href = "https://pandas.pydata.org/docs/getting_started/tutorials.html">Community tutorials</a>
- <a href = "https://pandas.pydata.org/docs/getting_started/comparison/index.html">Comparison with other tools</a>

### Link
- <a href = "https://pandas.pydata.org/docs/pandas.pdf">Pandas documentation</a>
- <a href = "https://www.amazon.com/gp/product/1491957662/ref=as_li_qf_asin_il_tl?ie=UTF8&tag=quantpytho-20&creative=9325&linkCode=as2&creativeASIN=1491957662&linkId=ea8de4253cce96046e8ab0383ac71b33">Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 2nd Edition</a>
- <a href = "https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf">Data Wrangling with pandas Cheat Sheet</a>
- <a href = "https://stackoverflow.com/questions/tagged/pandas">Stackoverflow Pandas</a>
- <a href = "https://github.com/pandas-dev/pandas">Source Repository</a>