
# Introduction to Conda

### What is Conda?
Conda is an open-source package management system and environment management system. It simplifies the installation and management of software packages and their dependencies for Python projects.

### Why use Conda?
Conda allows you to create isolated environments for different projects, ensuring that each project has its own dependencies and packages without affecting others.

###  Why is it beneficial to use Conda for Python environment management, especially in machine learning and data science projects?

Firstly, conda's package management system supports other languages including C++ and R which are also used in data science
and maching learning projects. 
Furthermore, conda supports packages such as pandas, NumPy, scikit-learn and SciPy which are also commonly used 
in those fields.
Lastly, conda can manage complex dependencies which is used in scientific computing libraries.

Leidel, J (2023) 12 Reasons to Choose Conda. Available at: https://www.anaconda.com/blog/12-reasons-to-choose-conda 



# Setting up a Python Environment with Conda

### Creating and Managing Environments
Conda allows you to create separate environments for different projects, each with its own set of packages.


### Describe the steps to create a new environment in Conda and install a specific version of Python and a few packages.


### Creating a new environment in Conda and installing a specific version of Python and a few packages
1) Follow the conda installation guide
2) Open the Anaconda prompt
3) Type `conda create --name <env_name> python=3.8`
4) Users will be prompted to proceed after being presented the new packages that are going to be installed. Type `y`
5) Activate the environment with `conda activate <env_name>`
6) To install the libraries, execute `conda install anaconda::numpy anaconda::pandas conda-forge::matplotlib anaconda::scipy`
7) There will be a prompt that will ask users if they want to proceed with the installation of the new packages. Type `y`
8) The previously mentionned packages will be downloaded and extracted
                        


# Introduction to Python Basics

Python is a popular programming language for data analysis and machine learning. In this section, we'll cover basic Python syntax and data types.

## Task
Writing a Python script that takes a list of numbers and returns a list containing only the even numbers.

In [13]:

# Task
# Using NumPy arrays and boolean indexing
import numpy as np

def get_even_numbers(nums):
    arr = np.array(nums)
    return arr[arr % 2 == 0].tolist()

# Application example
listSample = [1, 4, 13, 43, 44, 65, 80, 120, 133]
even_nums = get_even_numbers(listSample)
print("Even numbers: ", even_nums)


Even numbers:  [4, 44, 80, 120]


### The difference between a list and a tuple in Python.

A list is mutable/dynamic in Python whilst a tuple is immutable/static.
Lists in python are represented using square brackets '[]' whils parentheses '()' are used for tuples.
Tuples are lighter and consume less memory than python lists.

GeeksforGeeks (2023) Difference Between List and Tuple in Python. Available at: https://www.geeksforgeeks.org/python-difference-between-list-and-tuple/

# Working with NumPy

NumPy is a fundamental package for scientific computing with Python. It provides support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

## Create a NumPy array and perform basic array operations (addition, multiplication).

In [12]:

# Task
import numpy as np

arraySample = np.array([21, 22, 23, 24, 25])

# Basic operations
addition_array = arraySample + 3
print("Addition results:", addition_array)

multiplication_array = arraySample * 5
print("Multiplication results:", multiplication_array)

Addition results: [24 25 26 27 28]
Multiplication results: [105 110 115 120 125]


### Advantages of using NumPy arrays over standard Python lists for numerical data

NumPy's arrays are more compact and consume less memory than standard python lists
NumPy arrays are more efficient and convenient in terms of performing operations.
Numpy arrays are optimized for numerical computations and support several mathematical
functions including linear algebra and basic statistics.
Operations are also executed more efficiently as can be seen in the sample code above.

Geeksforgeeks (2023) Python Lists vs Numpy Arrays.Available at: https://www.geeksforgeeks.org/python-lists-vs-numpy-arrays/


# Data Manipulation with Pandas

Pandas is an open-source data analysis and manipulation tool, built on top of the Python programming language.

### Read a CSV file into a DataFrame, display the first few rows, and filter the data based on a condition.

In [11]:

# Task
# Suppose we use the diabetes.csv dataset
import pandas as pd

# Read CSV into a dataframe
df = pd.read_csv('diabetes.csv')

# Display the first few rows
print("First 5 rows")
print(df.head()) # display the first 5 rows

# Filter data based on condition: BMI is less than 30
bmi_less_than_30 = df.loc[df['BMI'] < 30]
print("BMI less than 30")
print(bmi_less_than_30)


First 5 rows
   AGE  SEX   BMI     BP   S1     S2    S3   S4      S5  S6    Y
0   59    2  32.1  101.0  157   93.2  38.0  4.0  4.8598  87  151
1   48    1  21.6   87.0  183  103.2  70.0  3.0  3.8918  69   75
2   72    2  30.5   93.0  156   93.6  41.0  4.0  4.6728  85  141
3   24    1  25.3   84.0  198  131.4  40.0  5.0  4.8903  89  206
4   50    1  23.0  101.0  192  125.4  52.0  4.0  4.2905  80  135
BMI less than 30
     AGE  SEX   BMI      BP   S1     S2    S3    S4      S5   S6    Y
1     48    1  21.6   87.00  183  103.2  70.0  3.00  3.8918   69   75
3     24    1  25.3   84.00  198  131.4  40.0  5.00  4.8903   89  206
4     50    1  23.0  101.00  192  125.4  52.0  4.00  4.2905   80  135
5     23    1  22.6   89.00  139   64.8  61.0  2.00  4.1897   68   97
6     36    2  22.0   90.00  160   99.6  50.0  3.00  3.9512   82  138
..   ...  ...   ...     ...  ...    ...   ...   ...     ...  ...  ...
436   33    1  19.5   80.00  171   85.4  75.0  2.00  3.9703   80   48
437   60    2  28.2 

### How a Pandas DataFrame is different from a NumPy array

A Panda dataframe is a two-dimensional (2D) data structure with rows and columns whilst a NumPy arrayis a type of multi-dimensional data structure
A NumPy arrays contain similar types of objects, but a Panda data frame can have objects of different data types
The element of a NumPy array can be accessed by referring to its index number, but Dataframe elementscan be accessed using both index number and named index

Geeksforgeeks (2023) Difference between Pandas VS NumPy.
Available at: https://www.geeksforgeeks.org/difference-between-pandas-vs-numpy/


# Data Visualization with Matplotlib

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

### Basic plot (line plot or bar chart) using data from a Pandas DataFrame.


In [None]:

# Task
# Creating a basic bar chart with the diabetes dataset
import pandas as pd
import matplotlib.pyplot as plt

# Suppose we use the diabetes.csv dataset
df1 = pd.read_csv("diabetes.csv")

plt.figure(figsize=(8, 6))
plt.bar(df1['AGE'], df1['BMI'])
plt.xlabel('AGE')
plt.ylabel('BMI')
plt.show()



### Importance of data visualization in Data Analysis 

Data visualisation helps to see patterns, trends and relationships that are not visibile with a raw data.
Data visualisation graphs help in finding anomalies, outliers and clusters.
Data visualisation can help create a narrative around a data
Data visualisation empowers data-driven decision-making


# Basic File Operations in Python

File operations are fundamental for data processing. In this section, we'll cover how to read from and write to files in Python.

### Write a Python script to read a text file and count the frequency of each word in the file.

In [8]:

# Task 
# Counting the frequency of each word in the .txt file using regular expressions and Counter
import re
from collections import Counter

with open('file_test.txt', 'r') as f:
    text_sample = f.read().lower()

# Extract words using regex pattern
pattern = re.findall(r'\b[a-z]{2,15}\b', text_sample)

counts_word = Counter(pattern)

for word, count in counts_word.items():
    print(f"{word}: {count}")

hello: 3
yes: 2
this: 1
is: 1
test: 2


### Different modes of opening a file in Python.
- 'r' opens a file for reading only. The file pointer is placed at the beginning of the file
- 'r+' opens a file for reading and writing. The file pointer is places at the beginning of the file
- 'w' truncates a file  if exists or create a new file for writing.
- 'w+' opens a file for reading and writing. It created a file if it does not exist, otherwise it
- truncates an existing one
- 'a' opens a file for appending data. It creates a new file if it does not exist. The file pointer
- is at the end of the file
- 'a+' opens a file for reading and writing
- 'x' opens a file for exclusive creation, and throw an error if the file already exists
- 'b' binary mode
- 't' default text mode

Available at https://docs.python.org/3/library/functions.html#open


# Introduction to SciPy

SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering.

### Demonstrate a simple mathematical operation using SciPy, like solving a linear equation.

In [7]:

# Task
import numpy as np
from scipy.linalg import solve

# 2x + 3y = 22
# 6x - 2y = 22

# Coefficient matrix
A = np.array([[2, 3], [6,-2]])

# Right-hand side vector
b = np.array([22, 22])

# Solving the system of equations
x, y = solve(A, b)

print(f"x = {x:.2f}")
print(f"y = {y:.2f}")


x = 5.00
y = 4.00


### Role of SciPy in scientific computing with Python


- The role of SciPy in scientific computing with Python
- Scipy provides solutions to common issues in scientific computing
- SciPy extends the capabilities of NumPy, and provides specialized functions for tasks such as optimization,
- Interpolation, image processing, statistics,...

 Varoquaux, G et al. (n.d.) 1.5. SciPy : high-level scientific computing.Available at: https://lectures.scientific-python.org/intro/scipy/index.html