# Pandas: Data Analysis Made Easy

[Pandas](https://pandas.pydata.org/) is an open-source data manipulation and analysis library for Python. It provides powerful tools for working with structured data, making data analysis tasks more efficient and intuitive.

## Why Pandas?

- **Flexible Data Structures:** Pandas offers two main data structures: Series (1-dimensional) and DataFrame (2-dimensional), which can handle both labeled and unlabelled data.

- **Data Cleaning and Preparation:** Pandas simplifies the process of cleaning and preparing data by providing functions to handle missing data, duplicate entries, data type conversions, and more.

- **Data Exploration and Analysis:** With Pandas, you can easily explore and analyze your data using functions for filtering, sorting, grouping, aggregating, and visualizing data.

- **Integration with Other Libraries:** Pandas seamlessly integrates with other Python libraries like NumPy, Matplotlib, and scikit-learn, making it a powerful tool for data analysis and machine learning workflows.

- **Rich Functionality:** Pandas offers a wide range of functions and methods for data manipulation, including merging and joining datasets, reshaping data, time series analysis, and handling large datasets efficiently.

- **Community Support:** Pandas has a large and active community of users and developers who contribute to its development, provide support, and share resources and best practices.

## Getting Started with Pandas

To get started with Pandas, you can install it using pip:

```bash
pip install pandas


In [1]:
! pip install pandas



Once installed, you can import Pandas in your Python scripts or Jupyter notebooks and start working with your data.

```bash
import pandas as pd

In [2]:
import pandas as pd

## Reading CSV files

In [3]:
reviews = pd.read_csv("shootings.csv", index_col=0)
# first 3 rows are printed
reviews.head(3)

Unnamed: 0_level_0,name,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera,arms_category
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
3,Tim Elliot,2015-01-02,shot,gun,53.0,M,Asian,Shelton,WA,True,attack,Not fleeing,False,Guns
4,Lewis Lee Lembke,2015-01-02,shot,gun,47.0,M,White,Aloha,OR,False,attack,Not fleeing,False,Guns
5,John Paul Quintero,2015-01-03,shot and Tasered,unarmed,23.0,M,Hispanic,Wichita,KS,False,other,Not fleeing,False,Unarmed


In [4]:
# last 5 rows
reviews.tail()

Unnamed: 0_level_0,name,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera,arms_category
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
5916,Rayshard Brooks,2020-06-12,shot,Taser,27.0,M,Black,Atlanta,GA,False,attack,Foot,True,Electrical devices
5925,Caine Van Pelt,2020-06-12,shot,gun,23.0,M,Black,Crown Point,IN,False,attack,Car,False,Guns
5918,Hannah Fizer,2020-06-13,shot,unarmed,25.0,F,White,Sedalia,MO,False,other,Not fleeing,False,Unarmed
5921,William Slyter,2020-06-13,shot,gun,22.0,M,White,Kansas City,MO,False,other,Other,False,Guns
5924,Nicholas Hirsh,2020-06-15,shot,gun,31.0,M,White,Lawrence,KS,False,attack,Car,False,Guns


In [18]:
# All columns of the dataframe
reviews.columns

Index(['name', 'date', 'manner_of_death', 'armed', 'age', 'gender', 'race',
       'city', 'state', 'signs_of_mental_illness', 'threat_level', 'flee',
       'body_camera', 'arms_category'],
      dtype='object')

In [5]:
# shows that the readed csv file is in dataframe format
type(reviews)

pandas.core.frame.DataFrame

In [7]:
#skip first and 10th row
skip_rows = pd.read_csv("shootings.csv",skiprows = [1,10])
skip_rows.head(3)

Unnamed: 0,id,name,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera,arms_category
0,4,Lewis Lee Lembke,2015-01-02,shot,gun,47.0,M,White,Aloha,OR,False,attack,Not fleeing,False,Guns
1,5,John Paul Quintero,2015-01-03,shot and Tasered,unarmed,23.0,M,Hispanic,Wichita,KS,False,other,Not fleeing,False,Unarmed
2,8,Matthew Hoffman,2015-01-04,shot,toy weapon,32.0,M,White,San Francisco,CA,True,attack,Not fleeing,False,Other unusual objects


In [11]:
# number of null valules in each column
null_count_per_column = reviews.isnull().sum()
null_count_per_column

name                       0
date                       0
manner_of_death            0
armed                      0
age                        0
gender                     0
race                       0
city                       0
state                      0
signs_of_mental_illness    0
threat_level               0
flee                       0
body_camera                0
arms_category              0
dtype: int64

### describe() 
function in Pandas is a powerful tool for generating descriptive statistics of numerical data in a DataFrame. When applied to a DataFrame, it provides a summary of various statistical measures for each numerical column, including count, mean, standard deviation, minimum, maximum, and percentiles. This function is particularly useful for getting a quick overview of the distribution and central tendency of numerical data, helping users to understand their dataset's characteristics at a glance.

In [12]:
# only numeric columns are considered
reviews.describe()

Unnamed: 0,age
count,4895.0
mean,36.54975
std,12.694348
min,6.0
25%,27.0
50%,35.0
75%,45.0
max,91.0


In [13]:
# show the data type of each column
reviews.dtypes

name                        object
date                        object
manner_of_death             object
armed                       object
age                        float64
gender                      object
race                        object
city                        object
state                       object
signs_of_mental_illness       bool
threat_level                object
flee                        object
body_camera                   bool
arms_category               object
dtype: object

### save as CSV file

In [16]:

reviews.to_csv("reviews.csv") # save the dataframe as it is to csv

reviews.to_csv("reviews_1.csv", index=False) # save the dataframe as it is to csv without index

reviews.to_csv("reviews_2.csv", index=False, header=False) # save the dataframe as it is to csv without index and header

reviews.to_csv("reviews_3.csv", index=False, header=False, columns=None) # save the dataframe as it is to csv without index, header and columns

reviews.to_csv("reviews_4.csv", index=False, header=False, columns=None, sep=",") # save the dataframe as it is to csv without index, header and columns and with a separator

reviews.to_csv("reviews_5.csv", index=False, header=False, columns=None, sep=",", encoding="utf-8") # save the dataframe as it is to csv without index, header and columns and with a separator and encoding

reviews.to_csv("reviews_6.csv",index=False,sep = "#", columns=['gender','age']) # save the dataframe as it is to csv without index, seperate with '#' and columns age and gender

# Series in DataFrame

In Pandas, a Series is a one-dimensional array-like object that can hold any data type (e.g., integers, strings, floats). It's essentially a labeled array capable of holding data of any type.

## Usage

A Series is commonly used to represent a single column or row of data in a DataFrame. It can be created from various data structures like lists, dictionaries, or NumPy arrays.

In [19]:
type(reviews['age'])

pandas.core.series.Series

In [21]:
# when two square brackets are given then its considered as data frame
type(reviews[['age']])

pandas.core.frame.DataFrame

In [27]:
# getting the columns with name
reviews[['age', 'gender']]

Unnamed: 0_level_0,age,gender
id,Unnamed: 1_level_1,Unnamed: 2_level_1
3,53.0,M
4,47.0,M
5,23.0,M
8,32.0,M
9,39.0,M
...,...,...
5916,27.0,M
5925,23.0,M
5918,25.0,F
5921,22.0,M


Pandas documentation and online resources provide comprehensive guidance and tutorials to help you learn and master the library for your data analysis tasks.

## Conclusion
Pandas is an essential tool for data analysts, scientists, and engineers working with structured data in Python. Its intuitive interface, rich functionality, and seamless integration with other libraries make it the go-to choice for data manipulation and analysis tasks.

Happy coding with Pandas! 🐼🚀