# NumPy

**Numeric Python** is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.

In [43]:
import numpy as np

In [67]:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 8])
print(type(a))
print(a.shape)
print(a.dtype)

<class 'numpy.ndarray'>
(9,)
int32


In [45]:
a[8] = 9
a = np.append(a, 10)
print(a)

[ 1  2  3  4  5  6  7  8  9 10]


In [49]:
print(a)
print('\n')

print(a[0:5])
print(a[0:5:2])
print(a[0:-1])
print(a[4::-1])
print(a[5:0:-2])

[ 1  2  3  4  5  6  7  8  9 10]


[1 2 3 4 5]
[1 3 5]
[1 2 3 4 5 6 7 8 9]
[5 4 3 2 1]
[6 4 2]


In [48]:
print('Vector max %d, min %d, mean %.2f, median %.2f, stardard deviation %.2f and total sum %d' %
      (a.max(), np.min(a), a.mean(), np.median(a), a.std(), a.sum()))

Vector max 10, min 1, mean 5.50, median 5.50, stardard deviation 2.87 and total sum 55


In [56]:
a[a > a.mean()]

array([ 6,  7,  8,  9, 10])

In [57]:
a[[4, 1, 5]]

array([5, 2, 6])

In [58]:
# Sorted array
print(np.sort(a))

# Order of indices in sorted array
print(np.argsort(a))

[ 1  2  3  4  5  6  7  8  9 10]
[0 1 2 3 4 5 6 7 8 9]


In [63]:
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])
a * b
a - b
a + b

array([3, 5, 7])

In [64]:
m_a = np.array([[1, 2, 3, 4]
                ,[13, 3, 8, 2]
                ,[8, 7, 2, 3]])
print(m_a.shape)

(3, 4)


In [None]:
print(m_a.max())
print(m_a.max(axis=0))
print(m_a.max(axis=1))

In [None]:
print(np.sort(m_a, axis=0))
print(np.sort(m_a, axis=1))

# Intro to Pandas data structures

In [69]:
import pandas as pd

## Series

[Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series) is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

In [99]:
s = pd.Series(data=[39.4, 91.2, 80.5, 20.3, 4.2, -13.4]
              ,index=['first', 'second', 'second', 'third', 'forth', 'fifth'])
print(type(s))
print(s.shape)
print(s.dtype)
print(s['second'])

<class 'pandas.core.series.Series'>
(6,)
float64
second    91.2
second    80.5
dtype: float64


In [100]:
s = pd.Series(data=[39.4, 91.2, 20.3, 4.2, -13.4])
s

0    39.4
1    91.2
2    20.3
3     4.2
4   -13.4
dtype: float64

Series acts very similarly to a [ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html), and is a valid argument to most [NumPy](https://numpy.org/doc/stable/user/whatisnumpy.html) functions. However, operations such as slicing will also slice the index.

In [77]:
s[1:4]

1    91.2
2    20.3
3     4.2
dtype: float64

In [89]:
np.max(s)
s.min()
s.std()

40.19736309759634

In [90]:
a = pd.Series([1, 2, 3])
b = pd.Series([2, 3, 4])

a * 2
a + b
a - b
a * b

0     2
1     6
2    12
dtype: int64

## DataFrame

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.

In [101]:
d = {"one": pd.Series([1.0, 2.0, 3.0], index=["a", "b", "c"]),
     "two": pd.Series([1.0, 2.0, 3.0, 4.0], index=["a", "b", "c", "d"]),
   }
df = pd.DataFrame(d)
df

Unnamed: 0,one,two
a,1.0,1.0
b,2.0,2.0
c,3.0,3.0
d,,4.0


In [106]:
d = {"one": [1.0, 2.0, 3.0, 4.0], "two": [4.0, 3.0, 2.0, 1.0]}
df = pd.DataFrame(d)
df 

Unnamed: 0,one,two
0,1.0,4.0
1,2.0,3.0
2,3.0,2.0
3,4.0,1.0
