# Numpy Demo
> This Notebook basically explores the numpy library and its essentials required for ML and EDA

In [None]:
!pip install -r requirements.txt

Let's say we want to use climate data like the temperature, rainfall and humidity in a region to determine if the region is well suited for growing apples. A really simple approach for doing this would be to formulate the relationship between the annual yield of apples (tons per hectare) and the climatic conditions like the average temperature (in degrees Farenheit), rainfall (in millimeters) & average relative humidity (in percentage) as a linear equation.

> yield_of_apples = wl * temperature + w2 * rainfall + w3 * humidity

**w1 ,w2 ,w3** are some weights from historic statistic data

> example
![image.png](attachment:image.png)

In [2]:
w1,w2,w3=0.3,0.2,0.5

In [5]:
kanto_temp=73
kanto_humidity=43
kanto_rainfall=67

kanto_yield=w1*kanto_temp + w2*kanto_humidity + w3*kanto_rainfall
print(f"yield of apples in kanto is {kanto_yield} tons per hectare")

yield of apples in kanto is 64.0 tons per hectare


The drawback in above way is for every parameter we are creating a variable instead of doing in this way we can create a **VECTOR** to repersent the parameters
therefore , 
each region can be reperesented as a vector and also the weights

In [12]:
#location params temp ,rainfall ,humidity 
kanto=[73,43,67]
johto=[91,88,64]
hoenn=[87,134,58]
sinnoh=[102,43,37]
unova=[69,96,70]

#weights
weights = [0.3, 0.2, 0.5]
def calculate_yield(region):
    """
        Calculate the agricultural yield for a given region based on environmental parameters.
        Inputs:
            - region: A list containing temperature, humidity, and rainfall values.
        Outputs:
            - The calculated yield as a float.
    """
    result=0
    #zip is used to pair each environmental parameter with its corresponding weight
    for param,wt in zip(region,weights):
        result += param * wt
    return result

help(calculate_yield)
kanto_yield = calculate_yield(kanto)
print(f"Kanto Yield: {kanto_yield} tons per hectare")

Help on function calculate_yield in module __main__:

calculate_yield(region)
    Calculate the agricultural yield for a given region based on environmental parameters.
    Inputs:
        - region: A list containing temperature, humidity, and rainfall values.
    Outputs:
        - The calculated yield as a float.

Kanto Yield: 64.0 tons per hectare


In [2]:
import numpy as np

kanto = np.array([73, 43, 67])


weights = np.array([0.3, 0.2, 0.5])
print(type(weights))
print(f"Kanto's humidity: {kanto[2]}%")
kanto_yield = np.dot(kanto, weights)
print(f"Kanto Yield: {int(kanto_yield)} tons per hectare")

<class 'numpy.ndarray'>
Kanto's humidity: 67%
Kanto Yield: 64 tons per hectare


In [28]:
# ...existing code...
import time

# plain Python lists a1, a2 already created earlier
result = 0
t0 = time.perf_counter()
for x1, x2 in zip(a1, a2):
    result += x1 * x2
t1 = time.perf_counter()
print(f"Pure Python loop result={result} time={(t1-t0):.4f} s")

# NumPy arrays npa1, npa2 already created earlier
t0 = time.perf_counter()
result_np = np.dot(npa1, npa2)
t1 = time.perf_counter()
print(f"NumPy dot result={result_np} time={(t1-t0):.4f} s")
# ...existing

Pure Python loop result=833332333333500000 time=0.1171 s
NumPy dot result=833332333333500000 time=0.0005 s


Hence **Numpy** is faster and more efficient than using traditional lists

In order to compute the yield for all the regions we can simply create an multi dimensional array instead of creating single array per region

**Numpy** Can support multi dimensional arrays 1D 2D 3D .. etc

In [None]:
regionData=np.array([
    [73, 43, 67],
    [80, 50, 60],
    [75, 45, 65]
])

demo3D=np.array([[
                   [73, 43],
                   [80, 50],
                   [75, 45]
                ],

                [
                    [15, 20],
                    [25, 30],
                    [35, 40]
                ]
               ])

print(np.shape(regionData))
print(np.shape(demo3D))

(3, 3)
(2, 3, 2)


shape() method returns the dimensions of array

for purpose of efficency and performance all elements in numpy should have same data types
we can check the type using **dtype** method
and numpy has diff data type than python

> note : if one of the number in the np array is float remaing all will convert into float

**matmul** method is used to do the matrix multiplication
it can also be done using **@**

In [None]:
#predicting the yield of each region
regionData=np.array([
    [73, 43, 67],
    [80, 50, 60],
    [75, 45, 65]
])
weights = np.array([0.3, 0.2, 0.5])
yield_vector=np.matmul(regionData,weights)
print(yield_vector)



[64. 64. 64.]


**.genfromtxt()** : In the Numpy library, numpy.genfromtxt function is used to read data from any text file and convert it into Numpy array. It is generally used in handling data with some missing or inconsistent values. This function has various fields of applications such as in the machine learning field, where missing or inconsistent data must be filled with some values to perform operations.

**.concatenate()** The NumPy `concatenate()` function is an array operation used to join two or more arrays along a specified axis. It is useful for combining datasets, restructuring arrays, and performing data manipulation tasks efficiently.

![image.png](attachment:image.png)

In [None]:
climateData=np.genfromtxt("weather_data.txt", delimiter=",", skip_header=1)
climateData.shape
yieldOfApples=climateData @ weights
yieldOfApples
#as we are adding a column we had set axis =1 if axis=0 then it adds a row
climateResults=np.concatenate((climateData,yieldOfApples.reshape(100000,1)),axis=1)
np.savetxt("climate_results.txt",climateResults,fmt="%.4f",delimiter=",",header="temp,humidity,rainfall,yield",comments='')

> References https://numpy.org/doc/stable/reference/routines.html