##  Python for Data Science , AI & Development 

### Working with Data in Python

---

## Pandas

### Reading Data with Pandas 

#### Importing Pandas

In [1]:
import pandas as pd

##### Data Loading

In [None]:
# Excel File
excel = pd.read_excel('file.xlsx')

# CSV File
csv = pd.read_csv('file.csv')

---

### Series with Pandas

##### What's a Series

A Series is a one-dimensional labeled array in Pandas. It can be thought of as a single column of data with labels or indices for each element. You can create a Series from various data sources, such as lists, NumPy arrays, or dictionaries

In [2]:
data = [10, 20, 30, 40, 50]
s = pd.Series(data)
print(s)

0    10
1    20
2    30
3    40
4    50
dtype: int64


##### Accessing Elements in a Series

In [None]:
# Access the element with label 2 (value 30)
print(s[2]) 
print()


# Access the element at position 3 (value 40)
print(s.iloc[3]) 
print()

# Access a range of elements by label
print(s[1:4]) 

30

40

1    20
2    30
3    40
dtype: int64


##### Series Atributes

- values: Returns the Series data as a NumPy array.

- index: Returns the index (labels) of the Series.

- shape: Returns a tuple representing the dimensions of the Series.

- size: Returns the number of elements in the Series.

- mean(), sum(), min(), max(): Calculate summary statistics of the data.

- unique(), nunique(): Get unique values or the number of unique values.

- sort_values(), sort_index(): Sort the Series by values or index labels.

- isnull(), notnull(): Check for missing (NaN) or non-missing values.

- apply(): Apply a custom function to each element of the Series.

---

### DataFrame with Pandas 

##### What's a DataFrame

A DataFrame is a two-dimensional labeled data structure with columns of potentially different data types. Think of it as a table where each column represents a variable, and each row represents an observation or data point. DataFrames are suitable for a wide range of data, including structured data from CSV files, Excel spreadsheets, SQL databases, and more.

##### Creating DataFrames from Dictionaries

In [6]:
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 28],
        'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data)
print(df)

      Name  Age           City
0    Alice   25       New York
1      Bob   30  San Francisco
2  Charlie   35    Los Angeles
3    David   28        Chicago


##### Column Selection

In [7]:
print(df['Name'])

0      Alice
1        Bob
2    Charlie
3      David
Name: Name, dtype: object


##### Accessing Rows:
You can access rows by their index using .iloc[] or by label using .loc[].

In [None]:
# Access the third row by position
print(df.iloc[2])   

print('-----------------------')

# Access the second row by label
print(df.loc[1])    

Name        Charlie
Age              35
City    Los Angeles
Name: 2, dtype: object
-----------------------
Name              Bob
Age                30
City    San Francisco
Name: 1, dtype: object


##### Slicing

You can slice DataFrames to select specific rows and columns.

In [13]:
print(df[['Name', 'Age']])  # Select specific columns
print(df[1:3])             # Select specific rows

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35
3    David   28
      Name  Age           City
1      Bob   30  San Francisco
2  Charlie   35    Los Angeles


##### Finding Unique Elements

Use the unique method to determine the unique elements in a column of a DataFrame.

In [None]:
unique_dates = df['Age'].unique()

##### Conditional Filtering

You can filter data in a DataFrame based on conditions using inequality operators.

For instance, you can filter albums released after a certain year.

In [None]:
album_year = df[df['Age'] > 25]

##### Saving DataFrames

To save a DataFrame to a CSV file, use the to_csv method and specify the filename with a “.csv” extension.Pandas provides other functions for saving DataFrames in different formats.

In [None]:
df.to_csv('trading_data.csv', index=False)

##### DataFrame Atributes

- shape: Returns the dimensions (number of rows and columns) of the DataFrame.

- info(): Provides a summary of the DataFrame, including data types and non-null counts.

- describe(): Generates summary statistics for numerical columns.

- head(), tail(): Displays the first or last n rows of the DataFrame.

- mean(), sum(), min(), max(): Calculate summary statistics for columns.

- sort_values(): Sort the DataFrame by one or more columns.

- groupby(): Group data based on specific columns for aggregation.

- fillna(), drop(), rename(): Handle missing values, drop columns, or rename columns.

- apply(): Apply a function to each element, row, or column of the DataFrame.

---

## Numpy

##### What's Numpy 

NumPy is a Python library used for working with arrays, linear algebra, fourier transform, and matrices. NumPy stands for Numerical Python and it is an open source project. The array object in NumPy is called ndarray, it provides a lot of supporting functions that make working with ndarray very easy.

---

#### Importing Numpy

In [2]:
import numpy as np

---

#### Numpy Array

In [4]:
a = np.array([0, 1, 2, 3, 4])
a

array([0, 1, 2, 3, 4])

#### What's his type of the array? 

In [5]:
a.dtype

dtype('int64')

This means that the type of de array is int64 , not a float or str.

---

#### Numpy Assign value

In [8]:
a[4] = 5
a

array([0, 1, 2, 3, 5])

#### Slicing 

In [3]:
import numpy as np
X=np.array([[1,0,1],[2,2,2]]) 
X.ndim

2

In [6]:
X=np.array([[1,0],[0,1]])
Y=np.array([[2,2],[2,2]]) 
Z=np.dot(X,Y)
Z

array([[2, 2],
       [2, 2]])