# <font color='289C4E'>Data structures<font><a class='anchor' id='top'></a>
### <font color='blue'>Series<font><a class='anchor' id='top'></a>
- [Array-like](#1)
- [Dictionary](#2)
- [Scalar](#3)
- [Series is similar to array](#4)
- [Series is similar to dictionary](#5)
- [Name attribute](#6)
### <font color='blue'>DataFrame<font><a class='anchor' id='top'></a>
- [From dict of Series or dicts](#1)
- [From dict of array-likes](#2)
- [From a list of dicts](#3)
- [From a dict of tuples](#4)
- [From a Series](#5)

## <font color='289C4E'>Data structures<font><a class='anchor' id='top'></a>
Pandas operates with three basic datastructures: Series, DataFrame, and Panel. There are extensions to this list, but for the purposes of this material even the first two are more than enough.

We start by importing NumPy and Pandas using their conventional short names:



In [1]:
pip install numpy

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install pandas

Note: you may need to restart the kernel to use updated packages.


In [3]:
import numpy as np

In [4]:
import pandas as pd

## <font color='blue'>Series<font><a class='anchor' id='top'></a>
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

 s = Series(data, index=index)

The first mandatory argument can be

- array-like
- dictionary
- scalar

In [5]:
marks=['91','82','93','94']
name =["hari","Givina","Shyam","Gita"]
student = pd.Series(marks, index=name)
print(student)

hari      91
Givina    82
Shyam     93
Gita      94
dtype: object


### Array-like
If data is an array-like, index must be the same length as data. If no index is passed, one will be created having values [0, ..., len(data) - 1].

In [6]:
randn = np.random.rand # To shorten notation in the code that follows

In [None]:
s = pd.Series(randn(5), index=['a', 'b', 'c', 'd', 'e'])
s

In [None]:
s.index

In [None]:
pd.Series(randn(5))

### Dictionary
Dictionaries already have a natural candidate for the index, so passing the index separately seems redundant, although possible.



In [None]:
d = {'a' : 0., 'b' : 1., 'c' : 2.}
pd.Series(d)

In [None]:
pd.Series(d, index=['b', 'c', 'd', 'a'])

### Scalar
If data is a scalar value, an index must be provided. The value will be repeated to match the length of index.



In [None]:
pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])

### Series is similar to array
Slicing and other operations on Series produce very similar results to those on array but with a twist. Index is also sliced and always remain a part of a data container.

In [None]:
s[1]

In [None]:
s[:4]

In [None]:
s[s > s.median()]

In [None]:
s[[4, 3, 1]]

Similarly to NumPy arrays, Series can be used to speed up loops by using vectorization.

In [None]:
s + s

In [None]:
s * 2

In [None]:
np.exp(s)

A key difference between Series and array is that operations between Series automatically align the data based on label. Thus, you can write computations without giving consideration to whether the Series involved have the same labels.

In [None]:
s[1:] + s[:-1]

The result of an operation between unaligned Series will have the union of the indexes involved. If a label is not found in one Series or the other, the result will be marked as missing NaN. Being able to write code without doing any explicit data alignment grants immense freedom and flexibility in interactive data analysis and research. The integrated data alignment features of the pandas data structures set pandas apart from the majority of related tools for working with labeled data.

### Series is similar to dictionary
A few examples to illustrate the heading.

In [None]:
s['a']

In [None]:
s['e'] = 12.

In [None]:
s

In [None]:
'e' in s

In [None]:
'f' in s

### Name attribute
Series can also have a name attribute which will become very useful when summarizing data with tables and plots.



In [None]:
s = pd.Series(np.random.randn(5),index=["a","b","c","d","e"], name='random series')
s

In [None]:
s.name

### <font color='blue'>DataFrame<font><a class='anchor' id='top'></a>
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. Like Series, DataFrame accepts many different kinds of input:

- Dict of 1D ndarrays, lists, dicts, or Series
- 2-D numpy.ndarray
- A Series
- Another DataFrame

Along with the data, you can optionally pass index (row labels) and columns (column labels) arguments. If you pass an index and / or columns, you are guaranteeing the index and / or columns of the resulting DataFrame. Thus, a dict of Series plus a specific index will discard all data not matching up to the passed index.

If axis labels are not passed, they will be constructed from the input data based on common sense rules.

### From dict of Series or dicts
The result index will be the union of the indexes of the various Series. If there are any nested dicts, these will be first converted to Series. If no columns are passed, the columns will be the sorted list of dict keys.

In [None]:
data={
    'name':pd.Series(["ram", "Gopal","Krishna"], index=[1,2,3]),
    'age':pd.Series(["21", "31", "29"],index=[1,2,3]),
    'city':pd.Series(["Kathmandu","Surkhet","Nepaljung"], index=[1,2,3])
}
df=pd.DataFrame(data)
df

In [None]:
d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']), 
     'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])
    }
df = pd.DataFrame(d)
df

In [None]:
pd.DataFrame(d, index=['d', 'b', 'a'])

In [None]:
pd.DataFrame(d, index=['a','d'], columns=['two'])

In [None]:
pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])

The row and column labels can be accessed respectively by accessing the index and columns attributes:

In [None]:
df.index

In [None]:
df.columns

## From dict of array-likes
The ndarrays must all be the same length. If an index is passed, it must clearly also be the same length as the arrays. If no index is passed, the result will be range(n), where n is the array length.

In [None]:
d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}
pd.DataFrame(d)

In [None]:
pd.DataFrame(d, index=['a', 'b', 'c', 'd'])

## From a list of dicts

In [None]:
data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
pd.DataFrame(data2)

In [None]:
pd.DataFrame(data2, index=['first', 'second'])

In [None]:
pd.DataFrame(data2, columns=['a', 'b'])

## From a dict of tuples

In [None]:
 pd.DataFrame({('a', 'b'): {('A', 'B'): 1, ('A', 'C'): 2},  ('a', 'a'): {('A', 'C'): 3,
                ('A', 'B'): 4},   ('a', 'c'): {('A', 'B'): 5, ('A', 'C'): 6}, 
               ('b', 'a'): {('A', 'C'): 7, ('A', 'B'): 8},     
               ('b', 'b'): {('A', 'D'): 9, ('A', 'B'): 10}})

## From a Series
The result will be a DataFrame with the same index as the input Series, and with one column whose name is the original name of the Series (only if no other column name provided).