# Numpy

### 1. What is Numpy? What is the importance and use of Numpy?

NumPy is a Python library used for working with arrays. NumPy stands for Numerical Python.

In Python we have lists that serve the purpose of arrays, but they are slow to process. NumPy aims to provide an array object that is up to 50x faster than traditional Python lists.

It also has functions for working in domain of linear algebra, fourier transform, and matrices and Tools for integrating C/C++ and Fortran code.


### 2. What is Vector / Matrix / Tensor ? 

#### Scalar
A scalar is just a single number

#### Vector
A vector is a One-dimensional array of numbers.

#### Matrix
A matrix is a 2-D array of numbers, so each element is identiﬁed by two indices instead of just one.

#### Tensor
An array with more than two axes is called tensor. We can create 3-dimensional array, or tensor, by numpy array function.

### 3. What is the difference between array and np.array ?

#### Arrays

    Need to import the standard library's array module
    Can only store objects of the same type
    Supports only one-dimensional arrays
    Can perform similar operations as lists, except for the type restriction
    You can create an array by specifying a type code in the constructor array.array(). 

In [1]:
import array as array
array1 = array.array('i', [1, 2])
print(array1)
print(array1.typecode)
print(array1.itemsize)

array('i', [1, 2])
i
4


Here the typecode ‘i’ specifies that objects stored in array will be of type signed integer.

#### numpy.ndarray

    Need to install and import NumPy
    Can only store objects of the same type
    Can store pointers to various types in an object type
    NumPy: Cast ndarray to a specific dtype with astype()
    Can represent multi-dimensional arrays
    Offers numerous methods and functions for numerical computation

In [2]:
import numpy as np
arr = np.array([0, 1, 2])
print(arr)

[0 1 2]


An array allows for strict memory management due to its restriction on the types of elements it can store, but if memory management is not a concern, use a list. It's not very useful for other purposes, except for those requiring memory size and memory address management (in my opinion).

For handling multi-dimensional arrays or performing numerical computations (scientific and technical operations) and matrix operations on arrays, use numpy.ndarray.

### 4. What is view ?

numpy.ndarray.view() helps to get a new view of array with the same data.

Syntax: ndarray.view(dtype=None, type=None)

In [3]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5])
x = arr.view()
arr[0] = 42

print(arr)
print(x)

[42  2  3  4  5]
[42  2  3  4  5]


### 5. What is masking ?

Masks are an array that contains the list of boolean values for the given condition. The masked array is the arrays that have invalid or missing entries.

Using Masking of arrays we can easily handle the missing, invalid, or unwanted entries in our array or dataset/dataframe. Masking is essential works with the list of Boolean values i.e, True or False which when applied to an original array to return the element of interest, here True refers to the value that satisfies the given condition whereas False refers to values that fail to satisfy the condition.

----------------------

# Pandas

### 1. Define the Pandas/Python pandas? What are the significant features of the pandas Library?

Pandas is a Python library used for working with data sets.

    It has functions for analyzing, cleaning, exploring, and manipulating data.
    Pandas allows us to analyze big data and make conclusions based on statistical theories.
    Pandas can clean messy data sets, and make them readable and relevant.
    Pandas provides time-series functionality.

### 2. Define Series in Pandas?

Pandas Series is a one-dimensional labeled array capable of holding data of any type. 

In [4]:
import pandas as pd

a = [1, 7, 2]
s = pd.Series(a)
s

0    1
1    7
2    2
dtype: int64

### 3. How will you create a series from dict in Pandas?

In [5]:
import pandas as pd

calories = {"day1": 420, "day2": 380, "day3": 390}
s = pd.Series(calories)
s

day1    420
day2    380
day3    390
dtype: int64

### 4.How can we create a copy of the series in Pandas?

The copy() method returns a copy of the DataFrame.

By default, the copy is a "deep copy" meaning that any changes made in the original DataFrame will NOT be reflected in the copy.

**Syntax:**
    dataframe.copy(deep)

In [6]:
import pandas as pd

index = list("WXYZ")
series = pd.Series([98,23,43,45], index=index)      #create a pandas Series
print('series')
print(series)

copy_sr = series.copy()    # create a copy
print("Copied series object:")
print(copy_sr)
copy_sr['W'] = 55    # update a value

print("objects after updating a value: ")
print(copy_sr)
print('series')
print(series)

series
W    98
X    23
Y    43
Z    45
dtype: int64
Copied series object:
W    98
X    23
Y    43
Z    45
dtype: int64
objects after updating a value: 
W    55
X    23
Y    43
Z    45
dtype: int64
series
W    98
X    23
Y    43
Z    45
dtype: int64


### 5. What is a pandas DataFrame? How will you create an empty DataFrame in Pandas?

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

In [7]:
import pandas as pd

df = pd.DataFrame() 
print(df)

Empty DataFrame
Columns: []
Index: []


### 6. How will you add a column to a pandas DataFrame?

Different ways of adding a new column to an existing data frame:

In [8]:
df = pd.DataFrame({"A": [1, 2, 3, 4],
                   "B": [5, 6, 7, 8]})

#### By declaring a new list as a column

In [9]:
df["C"] = [10, 20, 30, 40]

#### By using DataFrame.insert()

In [10]:
df.insert(1, "D", 5)

#### By using the Dataframe.assign() method

In [11]:
df = df.assign(F = df.C * 10)

#### By using the dictionary data structure

In [12]:
e = {'a':1, 'b':2, 'c':3, 'd':4}
df['E'] = e
df

Unnamed: 0,A,D,B,C,F,E
0,1,5,5,10,100,a
1,2,5,6,20,200,b
2,3,5,7,30,300,c
3,4,5,8,40,400,d


### 7. How to Rename the Index or Columns of a Pandas DataFrame?

In [13]:
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}) 
df2 = df.rename(columns={df.columns[1]: 'new'})

print(df2)

   a  new
0  1    3
1  2    4


### 8. How to iterate over a Pandas DataFrame?

In [14]:
import pandas as pd
data = {'Name': ['Ankit', 'Amit', 'Aishwarya', 'Priyanka'], 'Age': [21, 19, 20, 18], 
        'Stream': ['Math', 'Commerce', 'Arts','Biology'], 'Percentage': [88, 92, 95, 70]}
df = pd.DataFrame(data, columns=['Name', 'Age', 'Stream', 'Percentage'])
df

Unnamed: 0,Name,Age,Stream,Percentage
0,Ankit,21,Math,88
1,Amit,19,Commerce,92
2,Aishwarya,20,Arts,95
3,Priyanka,18,Biology,70


#### Using apply() method of the Dataframe : 

In [None]:
%%timeit 
df.apply(lambda row: row["Name"] + " " + str(row["Percentage"]), axis=1)

#### By using Pandas vectorization :

In [None]:
%%timeit 
df["Name"] + " " + df["Stream"]

#### By using NumPy vectorization

In [None]:
%%timeit 
df["Name"].to_numpy()  + " " + df["Stream"].to_numpy() 

https://towardsdatascience.com/efficiently-iterating-over-rows-in-a-pandas-dataframe-7dd5f9992c01

### 9. What is Pandas NumPy array?

For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index, Series, or DataFrame.

### 10. Define GroupBy in Pandas?

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. 

**Syntax :**
    dataframe.groupby()

### 11. How To Write a Pandas DataFrame to a File

It is common practice in data analysis to export data from Pandas DataFrames into CSV files because it can help conserve time and resources.

**Syntax :**
DataFrame.to_csv(filename, sep=',', index=False, encoding='utf-8')

The .to_csv() method is a built-in function in Pandas that allows you to save a Pandas DataFrame as a CSV file. 

**Other Ways to Save Pandas DataFrames :**

**to_excel():** This method is used to save a DataFrame as an Excel file.
    
**to_json():** This method is used to save a DataFrame as a JSON file.
    
**to_hdf():** This method is used to save a DataFrame as an HDF5 file, which is a hierarchical data format commonly used in scientific computing.
    
**to_sql():** This method is used to save a DataFrame to a SQL database.
    
**to_pickle():** This method is used to save a DataFrame as a pickled object, which is a serialized representation of the DataFrame.

----------

# Matplotlib

### 1. What is the purpose of matplotlib? Is matplotlib part of numpy?

Matplotlib is an amazing visualization library in Python for 2D plots of arrays. 

Matplotlib is a multi-platform data visualization library built on NumPy arrays and designed to work with the broader SciPy stack.

### 2. What is the use of pyplot in matplotlib? How to import matplotlibPyplot ?

Pyplot is a Matplotlib module that provides a MATLAB-like interface. Matplotlib is designed to be as usable as MATLAB, with the ability to use Python and the advantage of being free and open-source. 

Each pyplot function makes some changes to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc. The various plots we can utilize using Pyplot are Line Plot, Histogram, Scatter, 3D Plot, Image, Contour, and Polar.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

xpoints = np.array([0, 6])
ypoints = np.array([0, 250])

plt.plot(xpoints, ypoints)
plt.show()

### 3. Which is better matplotlib or seaborn? Does seaborn need matplotlib?

It is built on the roof of Matplotlib and is considered as a superset of the Matplotlib library.

Seaborn is a better choice for data visualization in data science than Matplotlib. Its simple and intuitive API, beautiful default style, ability to handle large datasets, and built-in functions for plotting common statistical plots make it the go-to choice for data scientists. 

### 4. Why do we need matplotlib inline?

This line tells Jupyter to display Matplotlib plots directly in the output of the notebook. One of the major benefits of using "Matplotlib inline" is that it allows you to quickly and easily visualize your data without having to save plots as image files or open them in a separate window.

-------------