In [4]:
import numpy as np
import pandas as pd

### Pandas:

__Pandas is an open-source, BSD-licensed Python library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, etc.__

### Featues of Pandas:
1. __Fast and efficient DataFrame object with default and customized indexing.__
2. __Tools for loading data into in-memory data objects from different file formats.__
3. __Data alignment and integrated handling of missing data.__
4. __Reshaping and pivoting of date sets.__
5. __Label-based slicing, indexing and subsetting of large data sets.__
6. __Columns from a data structure can be deleted or inserted.__
7. __Group by data for aggregation and transformations.__
8. __High performance merging and joining of data.__
9. __Time Series functionality.__

### When you want to use Pandas for data analysis, you’ll usually use it in one of three different ways:

1. __Convert a Python’s list, dictionary or Numpy array to a Pandas data frame__
2. __Open a local file using Pandas, usually a CSV file, but could also be a delimited text file (like TSV), Excel, etc__
3. __Open a remote file or database like a CSV or a JSON on a website through a URL or read from a SQL table/database.__

![pandas_1](/concepts/assets/pandas_1.png)

### Creating Pandas Series & DataFrames

#### From List

In [5]:
data = np.array([3,2,0,1])
apple = pd.Series(data)
print(apple)

0    3
1    2
2    0
3    1
dtype: int32


In [14]:
data = np.array([0,3,7,2])
orange = pd.Series(data)
print(orange)

0    0
1    3
2    7
3    2
dtype: int64


#### From Dictionary

In [17]:
apple_dict = {'a' : 0., 'b' : 1., 'c' : 2.,'d':3.}
apple = pd.Series(apple_dict)
print(apple)

a    0.0
b    1.0
c    2.0
d    3.0
dtype: float64


In [18]:
#Accessing A Series

In [19]:
apple

a    0.0
b    1.0
c    2.0
d    3.0
dtype: float64

In [21]:
apple['c'] #Accessing using index literal

2.0

In [24]:
apple[2] #Accessing using index value

2.0

#### Selecting sub set from a dataframe

In [26]:
apple[1:]

b    1.0
c    2.0
d    3.0
dtype: float64

In [29]:
apple['b':] #It means from b index to the end

b    1.0
c    2.0
d    3.0
dtype: float64

In [32]:
#apple['a':-1] #Create error either use literals on both or use the integers

In [33]:
apple[-1]

3.0

In [35]:
apple[2:-1] #From second value to the last(last excluded)

c    2.0
dtype: float64

### Creating a Data Frame

__Creating DataFrames right in Python is good to know and quite useful when testing new methods and functions you find in the pandas docs.__

There are many ways to create a DataFrame from scratch, but a great option is to just use a simple dict.

Let's say we have a fruit stand that sells apples and oranges. We want to have a column for each fruit and a row for each customer purchase. To organize this as a dictionary for pandas we could do something like:

In [43]:
data = {
    'apples': [3, 2, 0, 1], 
    'oranges': [0, 3, 7, 2]
}

And then pass it to the pandas DataFrame constructor:

In [44]:
purchases = pd.DataFrame(data)
purchases

Unnamed: 0,apples,oranges
0,3,0
1,2,3
2,0,7
3,1,2


__How did that work?__

Each __(key, value)__ item in data corresponds to a column in the resulting DataFrame.

The Index of this DataFrame was given to us on creation as the numbers 0-3, but we could also create our own when we initialize the DataFrame.

Let's have customer names as our index:

In [45]:
purchases = pd.DataFrame(data, index=['June', 'Robert', 'Lily', 'David'])
purchases

Unnamed: 0,apples,oranges
June,3,0
Robert,2,3
Lily,0,7
David,1,2


So now we could locate a customer's order by using their name:

In [46]:
purchases.loc['June']

apples     3
oranges    0
Name: June, dtype: int64

__There's more on locating and extracting data from the DataFrame later, but now you should be able to create a DataFrame with any random data to learn on. Let's move on to some quick methods for creating DataFrames from various other sources.__

Open up [this file](/concepts/assets/dummydata.csv) and see it's data.

It looks like this 

![dummydata](/concepts/assets/dummydata.png)

It's a csv file and pandas allows direct reading of csv filed through it.

In [56]:
df = pd.read_csv('assets/dummydata.csv') #Mention the path for the file
df #<--------------------- It loaded but we can make it much better

Unnamed: 0.1,Unnamed: 0,Apple,Orange
0,June,3,0
1,Robert,2,3
2,Lily,0,7
3,David,1,2


In [59]:
df = pd.read_csv('assets/dummydata.csv',index_col=0)
df #<<=============== we mentioned index_col= 0 that means whichever column is on 0 index in csv make it the index

Unnamed: 0,Apple,Orange
June,3,0
Robert,2,3
Lily,0,7
David,1,2


### External Sources:
[Pandas CheatSheet](https://www.dataquest.io/blog/pandas-cheat-sheet/) <br>
[Pandas Basic CheatSheet PDF](https://assets.datacamp.com/blog_assets/PandasPythonForDataScience.pdf)<br>
[Pandas CheatSheet PDF](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf) <br>
[Pandas DataFrame Notes](https://www.webpages.uidaho.edu/~stevel/504/Pandas%20DataFrame%20Notes.pdf)