# Data Analysis: An Analogy
## Let's understand Data Analysis with an example
Imagine you own an apple farm, and you want to know the number of apples you grow. But, you are too busy with the farm so you hire someone to count them. You sell your apples too, and you get your apple counter to keep a record of the number of apples you have in the beginning and at the end of the day, every day.

Many days and months pass and you put sheet after sheet of the apple count together and you discover patterns and trends in the purchasing behaviour of your customers.

The trends and patterns help you realise that during the colder season, your output of apples is the same, but people buy fewer apples compared to the summer.

You then set out to dig deeper into this trend and find ways to keep the sales of apples consistent throughout the year, beating your competitors at the game and becoming an apple farm tycoon.

Apples are your data, tracking them is important, analysis is key.

For starters, you will know if your supply of apples matches the market’s demand, as well as the consistency of the ratio of demand to supply throughout the year. Pegging the price to each apple and drawing the cost down gives you your profit.

When you have enough data, you will find trends and patterns in your production. These trends can help you understand your own organisation better, help you reduce inefficiency, and therefore reduce costs.

# What is Data Analysis?
As an apple farmer, you collected the count of apples at the beginning of the day and at the end of the day in an organized fashion in the sheets. In the end, you got some insightful information (i.e., trends and patterns in the sales of apples) from that data. This is called Data Analysis.

So collecting all the words above:
Data Analysis is a method of collecting, organizing, and, if required, manipulating the data so that one can derive some useful information from the data.

## Data Analysis and Pandas
Pandas is a tool in Python that helps you collect (or read) data from a file, organize it in a tabular format, manipulate and clean it, if required, to derive insightful information from it.

# What is Pandas?
Officially stands for Python Data Analysis Library.
It is an open-source Python library.
It is a tool used by data scientists to:
-read,

-write,

-manipulate, and

-analyze the data.
## Why Pandas?
It helps you explore and manipulate data in an efficient manner.

It helps you analyze large volumes of data with ease. When we say large volumes, it can be in millions of rows/records.

## Why is Pandas so Popular
-Easy to read and learn

-Extremely fast and powerful

-Integrates well with other visualization libraries
### Importing Pandas
Anytime you want to use a library in Python, your first priority should be to make it accessible.

You can import/load Pandas in your notebook or any other Python IDE in two different ways:

In [1]:
# import pandas
import pandas as pd

# Series

# Pandas Objects
Before we dive into series, let’s do a quick recap of pandas ‘objects’. At the core of the pandas library, there are two fundamental data structures/objects:

Series

Data Frames
## What is a Series?
A one-dimensional labeled array

Can hold data of any type

Is like a column in a table

## What can a Series have?
A Series can have all the elements as numbers in it

A Series can have all the elements as strings in it:

A Series can have its elements as both numbers and strings.

Series is like a list in Python that can take any type of value like integers, strings, floats (or decimal values), etc.
All the items in the series are labeled with indexes

By default, indexing starts from 0 in Series.

### Create a Series
Remember to import the library before using it!

You can create your own Series using a Python list:

In [2]:
h = [1,1,2,3,4,5,6,76,5]
pd.Series(h)

0     1
1     1
2     2
3     3
4     4
5     5
6     6
7    76
8     5
dtype: int64

In [3]:
# You can also create your own Series using a dictionary:

d = {"one":1, "two":2, "three": 3, "four":4, "five":5, "six":6, "seven": 7}
pd.Series(d)

one      1
two      2
three    3
four     4
five     5
six      6
seven    7
dtype: int64

# DataFrame
### What is a DataFrame?
-Two-dimensional table

-Made up of a collection of Series

Structured with labeled axes (rows and columns)
## Create a DataFrame
You can create a DataFrame using a Python list or a NumPy array:

In [5]:
data = [[1000, "Leorio", 86.58],
       [1001, "Kilua", 86.58],
       [1002, "Zenitsu", 86.58],
       [1003, "Tomioka", 86.58],
       [1004, "Inosuke", 86.58]]
pd.DataFrame(data)

Unnamed: 0,0,1,2
0,1000,Leorio,86.58
1,1001,Kilua,86.58
2,1002,Zenitsu,86.58
3,1003,Tomioka,86.58
4,1004,Inosuke,86.58


In [6]:
# Don’t like python default index starting from ‘0’? Well, you can give your own column and row indexes:

data = [[1000, "Leorio", 86.58],
       [1001, "Kilua", 86.58],
       [1002, "Zenitsu", 86.58],
       [1003, "Tomioka", 86.58],
       [1004, "Inosuke", 86.58]]
pd.DataFrame(data, columns = ["Regd. No", "Name", "Marks"], index = [1,2,3,4,5])


Unnamed: 0,Regd. No,Name,Marks
1,1000,Leorio,86.58
2,1001,Kilua,86.58
3,1002,Zenitsu,86.58
4,1003,Tomioka,86.58
5,1004,Inosuke,86.58


 # You can create a DataFrame using dictionary:

In [7]:
data = {"Regd.No": [1000,1001,1002,1003,1004],
       "Names": ["Leorio","Killua","Zenitsu","Tomioka", "Inosuke"],
       "Marks%": [86.29,91.63, 72.90,69.23,88.30]}
pd.DataFrame(data)

Unnamed: 0,Regd.No,Names,Marks%
0,1000,Leorio,86.29
1,1001,Killua,91.63
2,1002,Zenitsu,72.9
3,1003,Tomioka,69.23
4,1004,Inosuke,88.3


## A Column is a Series
A DataFrame is a collection of series.

A series is a column in a table or a DataFrame.

There are 3 series in the given DataFrame - ‘Regd. No’, ‘Names’ and ‘Marks%’.