# ***Finishing Up the Projects***

## ***Finding out the odd squares in a given range***
```
Define function: odd_squares
-- input : num1(lower_limit) , num2(upper_limit)
-- Loop through the numbers in the given range, i.e. range(num1,num2+1) ==> i
---- if ==> i == (int(i**0.5))**2 and i%2!=0
------ output: print( all the odd squares (i) between the given range)
```



## ***Finding out the armstrong numbers in a given range***
```
Define function: armstrong_number
-- input : num1(lower_limit) , num2(upper_limit)
-- Loop through the numbers in the given range, i.e. range(num1,num2+1) ==> i
---- convert 'i' to string, i.e. string = str(i)
---- number_of_digit = len(string)
---- sum = 0
---- Loop through the individual digits of 'i' ==> j
------ sum = sum + int (j)**number_of_digit
------ if ==> sum == i
-------- output: print( all the armstrong numbers (i) between the given range)
```




## ***Finding out the prime numbers in a given range***
```
Define function: prime_number
-- input : num1(lower_limit) , num2(upper_limit)
-- if lower_limit == 2
---- print(lower_limit)
-- Loop through the numbers in the given range, i.e. range(num1,num2+1) ==> i
---- Loop through all the numbers from 2 to (i-1), range(2,i) ==> j
------ if ==> i % j == 0
-------- break
---- if ==> j == i - 1
------ output: print( all the armstrong numbers (i) between the given range)
```

# ***Modules in Python***



*In Python, modules are files with functions and other reusable blocks of code that allow you to break complex projects into smaller, more manageable chunks. They can have different functions, variables, and classes in one file. We can also call them libraries. A Python module brings certain benefits such as we can reduce redundancy in the code.*

*For example, if you have a function that you use frequently in your code, you can put it in a module and import it into your program whenever you need it. This way, you don’t have to write the same code over and over again. Here are some examples of some common modules.*


- ***os:*** *This module provides a way of using operating system dependent functionality like reading or writing to the file system.*

- ***time:*** *This module provides various time-related functions. It is useful for getting the current time, measuring the time taken by a program to execute, and more.*

- ***math:*** *This module provides access to the mathematical functions defined by the C standard.*

- ***matplotlib:*** *This module is used for data visualization. Almost all plotting and vizualization are done using Matplotlib.*

- ***numpy:*** *This module is used for numerical computing with Python. It provides an array object which can be thought of as a more versatile form of a python list. It also provides some unique tool to reorganize and restructure any list or array object. In simple term, numpy array gives us more flexibility to work with lists.*

- ***pandas:*** *This module is used for data manipulation and analysis. It provides data structures for efficiently storing and manipulating large datasets.*


***Use of modules makes our python code concise and clear. We need to import the required python modules before we can use the functions available in those modules. We can also assign 'alias' if the imported modules require a lot of writing.***



```
import numpy
import pandas
import matplotlib.pyplot as plt
```



# ***Numpy***

***Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.***




***Creating an array: You can create an array using the `np.array()` function. For example, `a = np.array([1, 2, 3]) `creates an array with the values 1, 2, and 3***

In [None]:
# import numpy
a = numpy.array([1, 2, 3])
print(a)

***Arrays:***
> ***A numpy array is a grid of values, all of the `same type`, and is indexed by  nonnegative integers starting with zero. The number of dimensions is the `rank` of the array; the shape of an array is a set of integers giving the size of the array along each dimension.***



> `NOTE: Both python lists and numpy array resembles same structures, but the idea of dimensionality in numpy arrays gives the user more control.`



***We can initialize numpy arrays from nested Python lists, and access elements using square brackets:***


In [None]:
# some basic operations with numpy array
import numpy as np        # we can avoid typing numpy everytime by creating a shorthand for numpy, i.e. np

a = np.array([1, 2, 3])   # Create a rank 1 array
print(type(a))            # Prints "<class 'numpy.ndarray'>"
print(a.shape)            # Prints "(3,)"
print(a[0], a[1], a[2])   # Prints "1 2 3"
a[0] = 5                  # Change an element of the array
print(a)                  # Prints "[5, 2, 3]"

b = np.array([[1,2,3],[4,5,6]])    # Create a rank 2 array
print(b.shape)                     # Prints "(2, 3)"
print(b[0, 0], b[0, 1], b[1, 0])   # Prints "1 2 4"

***Some useful functions to create numpy array***



> `skip if you do not know what is matrix`




In [None]:
import numpy as np

a = np.zeros((2,2))   # Create an array of all zeros
print(a)              # Prints "[[ 0.  0.]
                      #          [ 0.  0.]]"

b = np.ones((1,2))    # Create an array of all ones
print(b)              # Prints "[[ 1.  1.]]"

c = np.full((2,2), 7)  # Create a constant array
print(c)               # Prints "[[ 7.  7.]
                       #          [ 7.  7.]]"

d = np.eye(2)         # Create a 2x2 identity matrix
print(d)              # Prints "[[ 1.  0.]
                      #          [ 0.  1.]]"

e = np.random.random((2,2))  # Create an array filled with random values
print(e)                     # Might print "[[ 0.91940167  0.08143941]
                             #               [ 0.68744134  0.87236687]]"

***`Array indexing`***

> `Numpy offers several ways to index into arrays`.

**Slicing**: *Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:*





In [None]:
import numpy as np

# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print("a = \n",a)

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 1:3]
print("b = \n",b)

# A slice of an array is a view into the same data, so modifying it
# will modify the original array.
print(a[0, 1])   # Prints "2"
b[0, 0] = 77     # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])   # Prints "77"

# Pay great attention while making changes in a slice of an array

***Performing mathematical operations on arrays: You can perform mathematical operations on arrays using NumPy functions. For example, `a = np.array([1, 2, 3])` and `b = np.array([4, 5, 6])`, then `c = a + b` creates a new array with the values `[5, 7, 9]`***

In [None]:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + 2*b

print(c)

In [None]:
import numpy as np

x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y)
print(np.add(x, y))

# Elementwise difference; both produce the array
# [[-4.0 -4.0]
#  [-4.0 -4.0]]
print(x - y)
print(np.subtract(x, y))

# Elementwise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(x * y)
print(np.multiply(x, y))

# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)
print(np.divide(x, y))

# Elementwise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(np.sqrt(x))

***Some common funcitons of numpy that is widely used in data analysis***
```
mean(): Calculates the mean of an array.

median(): Calculates the median of an array.

std(): Calculates the standard deviation of an array.

var(): Calculates the variance of an array.

min(): Returns the minimum value of an array.

max(): Returns the maximum value of an array.

argmin(): Returns the index of the minimum value in an array.

argmax(): Returns the index of the maximum value in an array.
```



In [None]:
import numpy as np

a = np.array([1, 2, 3, 4, 5])

print(np.mean(a))
print(np.median(a))
print(np.std(a))
print(np.var(a))
print(np.min(a))
print(np.max(a))
print(np.argmin(a))
print(np.argmax(a))


# ***Pandas***

***Creating a DataFrame: You can create a DataFrame using the pd.DataFrame() function. For example, `df = pd.DataFrame({'Name': ['John', 'Mike', 'Sara'], 'Age': [25, 30, 28]})` creates a DataFrame with two columns: Name and Age***.

In [None]:
import pandas as pd
df = pd.DataFrame({'Name': ['John', 'Mike', 'Sara'], 'Age': [25, 30, 28]})
df

***Let's say we have a fruit stand that sells apples and oranges. We want to have a column for each fruit and a row for each customer purchase. To organize this as a dictionary for pandas we could do something like:***

In [None]:
data = {
    'apples': [3, 2, 0, 1],
    'oranges': [0, 3, 7, 2]
}
purchases = pd.DataFrame(data)
purchases

***And then pass it to the pandas DataFrame constructor:***

In [None]:

purchases = pd.DataFrame(data, index=['June', 'Robert', 'Lily', 'David'])

purchases

***So now we could locate a customer's order by using their name:***

In [None]:
purchases.loc['June']

***We can also save a dataframe to a file using `to_csv()` function***

In [None]:
purchases.to_csv('purchases.csv')

***Reading data from a file: You can read data from a file using the `pd.read_csv() function`. For example, `df = pd.read_csv('data.csv')` reads data from a CSV file named data.csv and creates a DataFrame.***



> `CSVs don't have indexes like our DataFrames, so all we need to do is just designate the index_col when reading:`



In [None]:
df = pd.read_csv('purchases.csv', index_col=0)

df

***Now we will look into some important operation that we can do in a dataframe to prepare a data***



> `We are going to use a data file from IMDB which contains information about 1000 movies.`



In [None]:
# Let's first read the data and import it to a dataframe
movies_df = pd.read_csv('IMDB-Movie-Data.csv', index_col='Title')
movies_df

***The first thing to do when opening a new dataset is print out a few rows to keep as a visual reference. We accomplish this with `.head()`:***


> ***`.head()` outputs the first five rows of your DataFrame by default, but we could also pass a number as well: `movies_df.head(10)` would output the top ten rows.***

> ***To see the last five rows use `.tail()`. `tail()` also accepts a number.***

In [None]:
movies_df.head(2)

***Typically when we load in a dataset, we like to view the first five or so rows to see what's under the hood. Here we can see the names of each column, the index, and examples of values in each row.***

***You'll notice that the index in our DataFrame is the Title column, which you can tell by how the word Title is slightly lower than the rest of the columns.***

***`.info()` should be one of the very first commands you run after loading your data:***

In [None]:
movies_df.info()

***`.info()` provides the essential details about your dataset, such as the number of rows and columns, the number of non-null values, what type of data is in each column, and how much memory your DataFrame is using.***

>***Notice in our movies dataset we have some obvious missing values in the Revenue and Metascore columns. We'll look at how to handle those in a bit.***


***Another fast and useful attribute is `.shape`, which outputs in the form of (rows, columns):***

In [None]:
movies_df.shape

***Handling duplicates:***
>***This dataset does not have duplicate rows, but it is always important to verify you aren't aggregating duplicate rows.***

***To demonstrate, let's simply just double up our movies DataFrame by concatinating it to itself using `.concat()` function.***

In [None]:
temp_df = pd.concat([movies_df,movies_df])

temp_df.shape

***To remove duplicate data from out dataframe we can .drop_duplicates() function***

In [None]:
temp_df = temp_df.drop_duplicates()

temp_df.shape

***Column cleanup: Many times datasets will have verbose column names with symbols, upper and lowercase words, spaces, and typos. To make selecting data by column name easier we can spend a little time cleaning up their names. We can use the `.column()` function to see the list of columns in our dataset.***

In [None]:
movies_df.columns

***We can use the `.rename()` method to rename certain or all columns. We don't want parentheses, so let's rename those:***

In [None]:
movies_df.rename(columns={
        'Runtime (Minutes)': 'Runtime',
        'Revenue (Millions)': 'Revenue_millions'
    }, inplace=True)

# inplace = True : means we are not going create new columns but change the
#                  names in the same place.

movies_df.columns

***Excellent. But what if we want to lowercase all names? Instead of using `rename()` we could also set a list of names to the columns like so:***

In [None]:
movies_df.columns = ['rank', 'genre', 'description', 'director', 'actors', 'year', 'runtime',
                     'rating', 'votes', 'revenue_millions', 'metascore']


movies_df.columns

***How to work with missing values: When exploring data, you’ll most likely encounter missing or null values, which are essentially placeholders for non-existent values. Most commonly you'll see Python's None or NumPy's np.nan, each of which are handled differently in some situations.***

***There are two options in dealing with nulls:***
  1. ***Get rid of rows or columns with nulls***
  2. ***Replace nulls with non-null values, a technique known as imputation***

***Let's calculate to total number of nulls in each column of our dataset. The first step is to check which cells in our DataFrame are null:***

In [None]:
# movies_df.isnull()     # return the dataset
movies_df.isnull().sum() # sum up the null values

***we can drop the null values by using `.dropna()`. ***

In [None]:
movies_df.dropna()

***we can get a lot of basic information about our dataset by using `.describe()` function. ***

In [None]:
movies_df.describe()

***Relationships between continuous variables:  By using the correlation method `.corr()` we can generate the relationship between each continuous variable:***

> Positive numbers indicate a positive correlation — one goes up the other goes up — and negative numbers represent an inverse correlation — one goes up the other goes down. 1.0 indicates a perfect correlation.



In [None]:
movies_df.corr()

***Filtering data: You can filter data in a DataFrame using boolean indexing. For example, `df[df['Age'] > 25]` returns all rows where the Age column is greater than 25.***

In [None]:
movies_df[movies_df['year'] > 2014].head(5)

***Let's say we want all movies that were released between 2005 and 2010, have a rating above 8.0, but made below the 25th percentile in revenue.Here's how we could do all of that:***

In [None]:
movies_df[
    ((movies_df['year'] >= 2005) & (movies_df['year'] <= 2010))
    & (movies_df['rating'] > 8.0)
    & (movies_df['revenue_millions'] < movies_df['revenue_millions'].quantile(0.25))
]

# ***matplotlib***

***Creating a line plot: You can create a line plot using the `plot()` function. For example, `plt.plot([1, 2, 3, 4], [1, 4, 9, 16])` creates a line plot with the x-axis values `[1, 2, 3, 4]` and the y-axis values `[1, 4, 9, 16]`***

In [None]:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [1, 4, 9, 16]
plt.plot(x, y)

***Matplotlib offers a ton of different option to customize the plots as the user see fit.***

In [None]:
import numpy as np

# evenly sampled time at 200ms intervals
t = np.arange(0., 5., 0.2) # this is a useful numpy function

# red dashes, blue squares and green triangles
plt.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend(['x','x**2','x**3'])
plt.show()

[scatter plots](https://www.nytimes.com/2022/09/08/learning/whats-going-on-in-this-graph-sept-14-2022.html)

[pie plots](https://www.wired.com/2015/07/people-dont-see-social-media-important-news-source/)

[others](https://static01.nyt.com/images/2020/06/09/learning/LN-image-WGOTIG/LN-image-WGOTIG-jumbo.png?quality=75&auto=webp)

***Creating a scatter plot: You can create a scatter plot using the `scatter()` function. For example, `plt.scatter([1, 2, 3, 4], [1, 4, 9, 16])` creates a scatter plot with the x-axis values `[1, 2, 3, 4]` and the y-axis values `[1, 4, 9, 16]`***

In [None]:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [1, 4, 9, 16]
plt.scatter(x, y)

***You can use matplotlib to easily and efficiently create almost any type of plots (e.g. bar chart, histogram, pie chart, etc). The code structure remains similar***



***Creating a bar chart: You can create a bar chart using the `bar()` function. For example, `plt.bar(['A', 'B', 'C'], [10, 20, 30])` creates a bar chart with three bars labeled A, B and C***

In [None]:
x = ['A', 'B', 'C']
y = [10, 20, 30]
plt.bar(x, y)

***Creating a histogram: You can create a histogram using the `hist()` function. For example, `plt.hist([1, 2, 3], bins=5)` creates a histogram with five bins.***

In [None]:
x = np.random.rand(10)
plt.hist(x, bins=10)

***Creating a pie chart: You can create a pie chart using the `pie()` function. For example, `plt.pie([10, 20, 30], labels=['A', 'B', 'C'])` creates a pie chart with three slices labeled A, B and C.***

In [None]:
plt.pie([10, 20, 30], labels=['A', 'B', 'C'])

# ***Air Quality and Vizualization***

[Air quality data from purple air sensor on Engineering Hall rooftop](https://kmmukut.github.io/EntangledAir)

# ***Working with actual data file***


In [None]:
# Hourly averaged data for a full day
data = pd.read_csv('20230416.csv')

In [None]:
data

In [None]:
data.describe()

In [None]:
hour = np.array(data['hour'])
pm10 = np.array(data['PM10'])

plt.plot(hour, pm10)
plt.xlabel('hour')
plt.ylabel('PM10')