## Assignment 3: Python Libraries for Data Analysis and Visualization

### Numpy

****abs**(…)** *(ufunc)*

-   Element-wise absolute value. Equivalent to `np.absolute()` or
    `abs(arr)`.

``` python
import numpy as np
arr = np.array([-1, 2, -3])
abs_arr = abs(arr)
print(abs_arr)
```

output:

    [1 2 3]

****add**(…)** *(ufunc)*

-   Element-wise addition. Equivalent to `np.add()` or `arr1 + arr2`.

``` python
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
sum_arr = arr1 + arr2
print(sum_arr)
```

output:

    [5 7 9]

**all(…)** *(method of ndarray)*

-   Tests whether all array elements along a given axis evaluate to
    True.

``` python
import numpy as np
arr = np.array([[True, False], [True, True]])
all_elements = arr.all() # All elements in the array
print(all_elements)
all_axis0 = arr.all(axis=0) # Along axis 0 (columns)
print(all_axis0)
```

output:

    False
    [ True False]

**any(…)** *(method of ndarray)*

-   Tests whether any array elements along a given axis evaluate to
    True.

``` python
import numpy as np
arr = np.array([[False, False], [True, False]])
any_elements = arr.any() # Any element in the array
print(any_elements)
any_axis1 = arr.any(axis=1) # Along axis 1 (rows)
print(any_axis1)
```

output:

    True
    [False  True]

**argmax(…)** *(function)*

-   Returns the indices of the maximum values along an axis.

``` python
import numpy as np
arr = np.array([1, 5, 2, 8, 3])
max_index = np.argmax(arr) # Index of max value in flattened array
print(max_index)
arr_2d = np.array([[1, 5, 2], [8, 3, 9]])
max_index_axis0 = np.argmax(arr_2d, axis=0) # Max indices along axis 0 (columns)
print(max_index_axis0)
```

output:

    3
    [1 0 1]

**argmin(…)** *(function)*

-   Returns the indices of the minimum values along an axis.

``` python
import numpy as np
arr = np.array([5, 1, 8, 2, 9])
min_index = np.argmin(arr) # Index of min value in flattened array
print(min_index)
arr_2d = np.array([[5, 1, 8], [2, 9, 1]])
min_index_axis1 = np.argmin(arr_2d, axis=1) # Min indices along axis 1 (rows)
print(min_index_axis1)
```

output:

    1
    [1 2]

**argsort(…)** *(function)*

-   Returns the indices that would sort an array.

``` python
import numpy as np
arr = np.array([3, 1, 4, 2])
sorted_indices = np.argsort(arr)
print(sorted_indices) # Indices to sort arr
sorted_arr = arr[sorted_indices] # Applying indices to sort
print(sorted_arr)
```

output:

    [1 3 0 2]
    [1 2 3 4]

**astype(…)** *(method of ndarray)*

-   Copy of the array, cast to a specified type.

``` python
import numpy as np
arr = np.array([1.1, 2.9, 3.5])
int_arr = arr.astype(int) # Convert to integer type
print(int_arr)
string_arr = arr.astype(str) # Convert to string type
print(string_arr)
```

output:

    [1 2 3]
    ['1.1' '2.9' '3.5']

**concatenate(…)** *(function)*

-   Join a sequence of arrays along an existing axis.

``` python
import numpy as np
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6]])
concatenated_arr_axis0 = np.concatenate((arr1, arr2), axis=0) # Along axis 0 (rows)
print(concatenated_arr_axis0)
arr3 = np.array([[7], [8]])
concatenated_arr_axis1 = np.concatenate((arr1, arr3), axis=1) # Along axis 1 (columns)
print(concatenated_arr_axis1)
```

output:

    [[1 2]
     [3 4]
     [5 6]]
    [[1 2 7]
     [3 4 8]]

**copy(…)** *(method of ndarray)*

-   Return a copy of the array.

``` python
import numpy as np
arr = np.array([1, 2, 3])
copied_arr = arr.copy()
copied_arr[0] = 10 # Modifying copy doesn't affect original
print(arr)
print(copied_arr)
```

output:

    [1 2 3]
    [10  2  3]

**cos(…)** *(ufunc)*

-   Element-wise cosine. Equivalent to `np.cos()`.

``` python
import numpy as np
arr = np.array([0, np.pi/2, np.pi])
cos_arr = np.cos(arr)
print(cos_arr)
```

output:

    [ 1.0000000e+00  6.1232340e-17 -1.0000000e+00]

**cumsum(…)** *(method of ndarray)*

-   Return the cumulative sum of the elements along a given axis.

``` python
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
cumsum_arr = arr.cumsum() # Cumulative sum of all elements
print(cumsum_arr)
cumsum_axis0 = arr.cumsum(axis=0) # Cumulative sum along axis 0 (rows)
print(cumsum_axis0)
```

output:

    [ 1  3  6 10 15 21]
    [[1 2 3]
     [5 7 9]]

**dot(…)** *(function)*

-   Dot product of two arrays. For 2-D arrays it is equivalent to matrix
    multiplication.

``` python
import numpy as np
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
dot_product = np.dot(arr1, arr2) # Matrix multiplication
print(dot_product)
vec1 = np.array([1, 2, 3])
vec2 = np.array([4, 5, 6])
dot_vectors = np.dot(vec1, vec2) # Vector dot product
print(dot_vectors)
```

output:

    [[19 22]
     [43 50]]
    32

**exp(…)** *(ufunc)*

-   Element-wise exponential. Equivalent to `np.exp()`.

``` python
import numpy as np
arr = np.array([0, 1, 2])
exp_arr = np.exp(arr)
print(exp_arr)
```

output:

    [1.         2.71828183 7.3890561 ]

**expand_dims(…)** *(function)*

-   Expand the shape of an array by inserting a new axis at a given
    position.

``` python
import numpy as np
arr = np.array([1, 2, 3])
expanded_arr_axis0 = np.expand_dims(arr, axis=0) # Add axis at position 0 (row vector)
print(expanded_arr_axis0)
expanded_arr_axis1 = np.expand_dims(arr, axis=1) # Add axis at position 1 (column vector)
print(expanded_arr_axis1)
```

output:

    [[1 2 3]]
    [[1]
     [2]
     [3]]

**flatten(…)** *(method of ndarray)*

-   Return a copy of the array collapsed into one dimension.

``` python
import numpy as np
arr = np.array([[1, 2], [3, 4]])
flattened_arr = arr.flatten()
print(flattened_arr)
```

output:

    [1 2 3 4]

**reshape(…)** *(function/method)*

-   Gives a new shape to an array without changing its data.

``` python
import numpy as np
arr = np.arange(6) # Array from 0 to 5
reshaped_arr = arr.reshape((2, 3)) # Reshape to 2 rows, 3 columns
print(reshaped_arr)
reshaped_arr_method = arr.reshape(3, 2) # Using method syntax
print(reshaped_arr_method)
```

output:

    [[0 1 2]
     [3 4 5]]
    [[0 1]
     [2 3]
     [4 5]]

**sin(…)** *(ufunc)*

-   Element-wise sine. Equivalent to `np.sin()`.

``` python
import numpy as np
arr = np.array([0, np.pi/2, np.pi])
sin_arr = np.sin(arr)
print(sin_arr)
```

output:

    [0.0000000e+00 1.0000000e+00 1.2246468e-16]

**split(…)** *(function)*

-   Split an array into multiple sub-arrays as a list.

``` python
import numpy as np
arr = np.arange(9)
split_arr = np.split(arr, 3) # Split into 3 equal sub-arrays
print(split_arr)
arr_2d = np.arange(16).reshape((4, 4))
split_arr_axis1 = np.split(arr_2d, 2, axis=1) # Split along axis 1 (columns), 2 parts
print(split_arr_axis1)
```

output:

    [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
    [array([[ 0,  1],
           [ 4,  5],
           [ 8,  9],
           [12, 13]]), array([[ 2,  3],
           [ 6,  7],
           [10, 11],
           [14, 15]])]

**sqrt(…)** *(ufunc)*

-   Element-wise square root. Equivalent to `np.sqrt()`.

``` python
import numpy as np
arr = np.array([4, 9, 16])
sqrt_arr = np.sqrt(arr)
print(sqrt_arr)
```

output:

    [2. 3. 4.]

**sum(…)** *(function/method)*

-   Sum of array elements over a given axis.

``` python
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
sum_all = np.sum(arr) # Sum of all elements
print(sum_all)
sum_axis0 = np.sum(arr, axis=0) # Sum along axis 0 (columns)
print(sum_axis0)
sum_axis1_method = arr.sum(axis=1) # Using method syntax, sum along axis 1 (rows)
print(sum_axis1_method)
```

output:

    21
    [5 7 9]
    [ 6 15]

**transpose(…)** *(function/method)*

-   Reverse or permute the axes of an array; for a 2-D array, it’s
    matrix transpose.

``` python
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
transposed_arr = np.transpose(arr) # Transpose using function
print(transposed_arr)
transposed_arr_method = arr.T # Transpose using attribute .T (common shorthand)
print(transposed_arr_method)
```

output:

    [[1 4]
     [2 5]
     [3 6]]
    [[1 4]
     [2 5]
     [3 6]]

**zeros(…)** *(function)*

-   Return a new array of given shape and type, filled with zeros.

``` python
import numpy as np
zeros_arr = np.zeros((2, 2)) # 2x2 array of zeros (float64 by default)
print(zeros_arr)
int_zeros_arr = np.zeros((3,), dtype=int) # 1D array of 3 zeros with integer type
print(int_zeros_arr)
```

output:

    [[0. 0.]
     [0. 0.]]
    [0 0 0]

### Pandas

**read_csv(…)** *(function in pandas)*

-   Reads a comma-separated values (csv) file into a DataFrame.

``` python
import pandas as pd
# Assuming you have a file named 'data.csv' with comma-separated data
# For demonstration, let's create a DataFrame in memory instead of reading a file:
data = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data)
# In real-world: df = pd.read_csv('data.csv')
print(df)
```

output:

       col1  col2
    0     1     3
    1     2     4

**DataFrame(…)** *(class in pandas)*

-   Constructor to create a DataFrame, a 2D labeled data structure.

``` python
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 22],
        'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)
print(df)
```

output:

          Name  Age      City
    0    Alice   25  New York
    1      Bob   30    London
    2  Charlie   22     Paris

**Series(…)** *(class in pandas)*

-   Constructor to create a Series, a 1D labeled data structure.

``` python
import pandas as pd
data = [10, 20, 30, 40]
s = pd.Series(data, name='Values', index=['a', 'b', 'c', 'd'])
print(s)
```

output:

    a    10
    b    20
    c    30
    d    40
    Name: Values, dtype: int64

**head(…)** *(method of DataFrame/Series)*

-   Returns the first n rows (or elements for Series). Useful for quick
    inspection.

``` python
import pandas as pd
data = {'col1': range(5), 'col2': range(5, 10)}
df = pd.DataFrame(data)
print(df.head(3)) # First 3 rows
```

output:

       col1  col2
    0     0     5
    1     1     6
    2     2     7

**tail(…)** *(method of DataFrame/Series)*

-   Returns the last n rows (or elements for Series). Useful for
    checking data end.

``` python
import pandas as pd
data = {'col1': range(5), 'col2': range(5, 10)}
df = pd.DataFrame(data)
print(df.tail(2)) # Last 2 rows
```

output:

       col1  col2
    3     3     8
    4     4     9

**info(…)** *(method of DataFrame/Series)*

-   Prints a concise summary of a DataFrame or Series, including data
    types and memory usage.

``` python
import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
df.info()
```

output:

    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 2 entries, 0 to 1
    Data columns (total 2 columns):
     #   Column  Dtype
    ---  ------  -----
     0   Name    object
     1   Age     int64
    dtypes: int64(1), object(1)
    memory usage: 160.0 bytes

**describe(…)** *(method of DataFrame/Series)*

-   Generates descriptive statistics. For numeric data, includes count,
    mean, std, min, quartiles, max. For object data, count, unique, top,
    freq.

``` python
import pandas as pd
data = {'Age': [25, 30, 25, 40, 22], 'City': ['London', 'Paris', 'London', 'New York', 'Paris']}
df = pd.DataFrame(data)
print(df.describe()) # Describe numeric column 'Age' by default
print(df.describe(include='object')) # Describe object columns (e.g., 'City')
```

output:

                 Age
    count   5.000000
    mean   28.400000
    std     7.293833
    min    22.000000
    25%    25.000000
    50%    25.000000
    75%    30.000000
    max    40.000000
             City
    count       5
    unique      3
    top    London
    freq        2

**loc\[…\]** *(indexer of DataFrame/Series)*

-   Label-based indexing and selection by label or boolean array.

``` python
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 22]}
df = pd.DataFrame(data, index=['row1', 'row2', 'row3'])
print(df.loc['row2']) # Select row with label 'row2'
print(df.loc['row1':'row3', 'Name']) # Select rows 'row1' to 'row3' and column 'Name'
print(df.loc[df['Age'] > 24]) # Boolean indexing: select rows where 'Age' > 24
```

output:

    Name    Bob
    Age      30
    Name: row2, dtype: object
    row1      Alice
    row2        Bob
    row3    Charlie
    Name: Name, dtype: object
          Name  Age
    row1  Alice   25
    row2    Bob   30

**iloc\[…\]** *(indexer of DataFrame/Series)*

-   Integer-position based indexing and selection by integer position or
    boolean array.

``` python
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 22]}
df = pd.DataFrame(data)
print(df.iloc[1]) # Select row at integer position 1
print(df.iloc[0:2, 0]) # Select rows from position 0 to 2 (exclusive) and column at position 0
print(df.iloc[[True, False, True]]) # Boolean indexing by position
```

output:

    Name    Bob
    Age      30
    Name: 1, dtype: object
    0    Alice
    1      Bob
    Name: Name, dtype: object
          Name  Age
    0    Alice   25
    2  Charlie   22

**dropna(…)** *(method of DataFrame/Series)*

-   Removes missing values (NaN).

``` python
import pandas as pd
data = {'col1': [1, 2, None, 4], 'col2': [5, None, 7, 8]}
df = pd.DataFrame(data)
print(df.dropna()) # Drop rows with any NaN
print(df.dropna(axis=1)) # Drop columns with any NaN
```

output:

       col1  col2
    0   1.0   5.0
    3   4.0   8.0
       col1  col2
    0   1.0   5.0
    1   2.0   NaN
    2   NaN   7.0
    3   4.0   8.0

**fillna(…)** *(method of DataFrame/Series)*

-   Fills missing values with a specified value, method, etc.

``` python
import pandas as pd
data = {'col1': [1, 2, None, 4], 'col2': [5, None, 7, 8]}
df = pd.DataFrame(data)
print(df.fillna(0)) # Fill NaN with 0
print(df.fillna(method='ffill')) # Forward fill NaN
```

output:

       col1  col2
    0   1.0   5.0
    1   2.0   0.0
    2   0.0   7.0
    3   4.0   8.0
       col1  col2
    0   1.0   5.0
    1   2.0   5.0
    2   2.0   7.0
    3   4.0   8.0

**astype(…)** *(method of DataFrame/Series)*

-   Casts a Pandas object to a specified dtype.

``` python
import pandas as pd
data = {'col1': [1, 2, 3], 'col2': ['4', '5', '6']}
df = pd.DataFrame(data)
print(df.dtypes)
df['col2'] = df['col2'].astype(int) # Convert 'col2' to integer type
print(df.dtypes)
```

output:

    col1     int64
    col2    object
    dtype: object
    col1    int64
    col2    int64
    dtype: object

**drop(…)** *(method of DataFrame/Series)*

-   Removes rows or columns by labels.

``` python
import pandas as pd
data = {'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]}
df = pd.DataFrame(data)
print(df.drop(columns=['col2'])) # Drop column 'col2'
print(df.drop(index=[0, 1])) # Drop rows at index 0 and 1
```

output:

       col1  col3
    0     1     7
    1     2     8
    2     3     9
       col1  col2  col3
    2     3     6     9

**rename(…)** *(method of DataFrame/Series)*

-   Alters column or index labels.

``` python
import pandas as pd
data = {'old_col1': [1, 2], 'old_col2': [3, 4]}
df = pd.DataFrame(data)
print(df.rename(columns={'old_col1': 'new_col1', 'old_col2': 'new_col2'})) # Rename columns
print(df.rename(index={0: 'row_one', 1: 'row_two'})) # Rename index
```

output:

       new_col1  new_col2
    0         1         3
    1         2         4
             old_col1  old_col2
    row_one         1         3
    row_two         2         4

**apply(…)** *(method of DataFrame/Series)*

-   Apply a function along an axis of the DataFrame or Series.

``` python
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print(df.apply(sum, axis=0)) # Apply sum function to each column (axis=0)
print(df['A'].apply(lambda x: x*2)) # Apply lambda function to Series 'A'
```

output:

    A    6
    B    15
    dtype: int64
    0    2
    1    4
    2    6
    Name: A, dtype: int64

**map(…)** *(method of Series)*

-   Used for element-wise transformations on a Series, often with
    dictionaries or functions.

``` python
import pandas as pd
s = pd.Series(['cat', 'dog', 'cat', 'fish'])
mapping = {'cat': 'mammal', 'dog': 'mammal', 'fish': 'non-mammal'}
mapped_series = s.map(mapping) # Map values in Series based on dictionary
print(mapped_series)
```

output:

    0      mammal
    1      mammal
    2      mammal
    3    non-mammal
    dtype: object

**groupby(…)** *(method of DataFrame/Series)*

-   Groups rows in DataFrame/Series with the same value in a specified
    column(s).

``` python
import pandas as pd
data = {'Category': ['A', 'B', 'A', 'B', 'A'], 'Values': [10, 20, 15, 25, 12]}
df = pd.DataFrame(data)
grouped_df = df.groupby('Category')
print(grouped_df) # GroupBy object
print(grouped_df.sum()) # Sum of 'Values' for each category
```

output:

    <pandas.core.groupby.generic.DataFrameGroupBy object at 0x...>
              Values
    Category
    A             37
    B             45

**agg(…)** *(method of GroupBy/DataFrame/Series)*

-   Aggregates using one or more operations over specified axis. Often
    used after `groupby()`.

``` python
import pandas as pd
data = {'Category': ['A', 'B', 'A', 'B', 'A'], 'Values': [10, 20, 15, 25, 12]}
df = pd.DataFrame(data)
grouped_df = df.groupby('Category')
aggregated_df = grouped_df.agg(['sum', 'mean']) # Aggregate with sum and mean
print(aggregated_df)
aggregated_df_dict = grouped_df.agg({'Values': ['min', 'max']}) # Aggregate 'Values' with min and max
print(aggregated_df_dict)
```

output:

              Values
                sum  mean
    Category
    A            37  12.333333
    B            45  22.500000
              Values
                min  max
    Category
    A            10   15
    B            20   25

**merge(…)** *(function in pandas)*

-   Joins DataFrame objects column-wise in database-style.

``` python
import pandas as pd
df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2'], 'A': ['A0', 'A1', 'A2']})
df2 = pd.DataFrame({'key': ['K0', 'K1', 'K3'], 'B': ['B0', 'B1', 'B2']})
merged_df = pd.merge(df1, df2, on='key', how='left') # Left merge on 'key' column
print(merged_df)
```

output:

      key   A    B
    0  K0  A0   B0
    1  K1  A1   B1
    2  K2  A2  NaN

**sort_values(…)** *(method of DataFrame/Series)*

-   Sorts by the values along either axis.

``` python
import pandas as pd
data = {'col1': [3, 1, 2], 'col2': [6, 4, 5]}
df = pd.DataFrame(data)
print(df.sort_values(by='col1')) # Sort by 'col1' column
print(df.sort_values(by=['col2', 'col1'])) # Sort by 'col2', then 'col1'
```

output:

       col1  col2
    1     1     4
    2     2     5
    0     3     6
       col1  col2
    1     1     4
    2     2     5
    0     3     6

**value_counts(…)** *(method of Series)*

-   Returns a Series containing counts of unique values.

``` python
import pandas as pd
s = pd.Series(['a', 'b', 'a', 'c', 'b', 'a'])
value_counts = s.value_counts()
print(value_counts)
```

output:

    a    3
    b    2
    c    1
    dtype: int64