<a href="https://colab.research.google.com/github/tomersk/learn-python/blob/main/03_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 3 Array and dataframe

When we need to store more than one value of a variable e.g. daily temperature over a long time period, or data over space and sometime in space and time, we can make use of array to store this data. Array can be more than 1-dimensional. 

When we store multiple variables, DataFrame are a useful structure.

## 3.1 Generating sequential arrays
Often we need vectors whose elements follow a simple order, for example a vector containing elements:
*   [10, 11, 12, 13], or 
*   [5, 10, 15, 20], or
*   [1.0, 1.2, 1.4, 1.6, 1.8, 2.0]. 

We see that in these vectors, items follow some simple order, so it would be nicer if there are easy way to define these kinds of vectors. Some of the way to create these vectors are following: 

### 3.1.1 linspace
If we are interested in generating the vector, whose elements are uniformly spaced and we know the upper, lower limit and the number of elements, then in that case linspace is the preferred choice.

In [1]:
import numpy as np
np.linspace(0, 2, 9)

array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

Because linspace lies in numpy library, so first we have imported the library and have given it an abbreviated name. Then we call the linspace with lower limit, upper limit and the number of element to be generated. In this example, 0 is the lower limit, 2 is the upper limit, and number of elements are 9. 

Let us generate one more vector to understand more about this function, this time we take lower limit as 0, upper limit as $2\pi$, and number of elements to be 100.

In [4]:
x = np.linspace(0, 2*np.pi, 100)
print(x)

[0.         0.06346652 0.12693304 0.19039955 0.25386607 0.31733259
 0.38079911 0.44426563 0.50773215 0.57119866 0.63466518 0.6981317
 0.76159822 0.82506474 0.88853126 0.95199777 1.01546429 1.07893081
 1.14239733 1.20586385 1.26933037 1.33279688 1.3962634  1.45972992
 1.52319644 1.58666296 1.65012947 1.71359599 1.77706251 1.84052903
 1.90399555 1.96746207 2.03092858 2.0943951  2.15786162 2.22132814
 2.28479466 2.34826118 2.41172769 2.47519421 2.53866073 2.60212725
 2.66559377 2.72906028 2.7925268  2.85599332 2.91945984 2.98292636
 3.04639288 3.10985939 3.17332591 3.23679243 3.30025895 3.36372547
 3.42719199 3.4906585  3.55412502 3.61759154 3.68105806 3.74452458
 3.8079911  3.87145761 3.93492413 3.99839065 4.06185717 4.12532369
 4.1887902  4.25225672 4.31572324 4.37918976 4.44265628 4.5061228
 4.56958931 4.63305583 4.69652235 4.75998887 4.82345539 4.88692191
 4.95038842 5.01385494 5.07732146 5.14078798 5.2042545  5.26772102
 5.33118753 5.39465405 5.45812057 5.52158709 5.58505361 5.648520

By default the number of elements are 50, so if we do not specify the number of elements, we get 50 elements with equal spacing. We can use len function to get the length of any array.

In [5]:
foo = np.linspace(0,1)
len(foo)

50

### 3.1.2 arange
Suppose again we want to generate a vector whose elements are uniformly spaced, but this time we do not know the number of elements, we just know the increment between elements. In such situation arange is used. arange also requires lower and upper bounds. In the following example we are generating the vector having lower element as 10, upper element as 30 and having an increment of 30. So from the knowledge of linspace we will do something like this.


In [6]:
np.arange(10, 30, 5)

array([10, 15, 20, 25])

Oh! What happened? Why did Python not print 30. Because arange function does not include second argument in the elements. So we want to print upto 30, we would do.

In [7]:
np.arange(10, 31, 5)

array([10, 15, 20, 25, 30])

This time we get the required output. The arange can also take a float increment. Let us generate some vector with lower bound of 0, upper bound of 2 and with an increment of 0.3.

In [8]:
np.arange(0, 2, 0.3) # it accepts float arguments

array([0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])

In the case of float increment also, the maximum value of generated elements is lesser than the second argument given to the arange.

### 3.1.3 zeros
zeros is used when we want to generate all the items in vector as 0.

In [9]:
foo = np.zeros(5)
print(foo)

[0. 0. 0. 0. 0.]


### 3.1.4 ones
ones is used when all the required elements in vector are 1. Let us say, we want to generate a variable foo which has all the elements equal to one, and has the dimension of 3×2.

In [10]:
foo = np.ones((3,2))
foo

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

Remember that if the number of dimensions are more than one, the dimension are given as tuple, e.g. (2,5).


### 3.1.5 empty
*empty* is useful in initializing the variables. This assigns the garbage values to the elements, which are to be modified/updated later.

In [11]:
foo = np.empty((2,5))
foo

array([[1.27430123e-316, 1.77863633e-322, 0.00000000e+000,
        0.00000000e+000, 0.00000000e+000],
       [2.23289529e+180, 1.74254942e-076, 4.95643980e-090,
        6.28756268e-066, 3.99473231e-315]])

Additionally in zeros, ones, empty, the data type (e.g. int, float etc.) also can be defined.

In [12]:
foo = np.empty((2,5),int)
foo

array([[           25792144,                  36,                   0,
                          0,                   0],
       [7305181858796548918, 3473451130437056301, 3270793523756872760,
        3631361885789761844,           808542821]])

You can see that all the elements of foo are now integer, even though the values are useless.

### 3.1.6 rand
*rand* is used to generate uniformly distributed random variables over the range of 0 to 1.

In [13]:
foo = np.random.rand(3,2)
foo

array([[0.0603704 , 0.24580037],
       [0.2110914 , 0.96110024],
       [0.9935456 , 0.33570921]])

### 3.1.7 randn
*randn* is used to generate random variable having normal distribution with mean equal to zero and variance equal to one.

In [14]:
foo = np.random.randn(2,4)
foo

array([[ 0.90902554,  0.43302782, -0.38472946, -0.56662238],
       [ 0.08164287,  0.30499331,  1.934709  , -0.60454243]])