# Brackets

**Round Brackets `()`**
- function calls: `function_name(argument)`
- tuple creation: `my_tuple = (1,2,3)`

**Square Brackets `[]`**
- list creation: `my_list = [1,2,3]`
- accessing elements: `my_list[0]`
- list comprehensions: `[x**2 for x in range(5)]`
- accessing a character string: `hello[0]`

**Curly Brackets `{}`**
- dictionary creation: `my_dict = {'key1': 'value1', 'key2': 'value2'}`
- set creation: `my_set = {1, 2, 3, 4}`
- code blocks
- string formatting (as placeholder indication):  
  `name = "Alice"`  
   `age = 30`  
   `print("Name: {}, Age: {}".format(name, age))`

# Data Structures

- Mutable: Can be modified after creation.  
- Ordered: Elements maintain the order in which they were added. Important to know when mutating them.
- Unique: Contain only unique elements i.e. duplicate elements are automatically removed.
- Accessed by index: Elements can be accessed using square brackets and an index.

|Structure    | Usage | Mutable | Ordered | Unique | Access | Syntax |
|-------------|-------|---------|---------|--------|--------|--------|
| **Tuples**  | for a collection of data, that should not change (e.g. coordinates, color codes); <br>  often used to return multiple values from a function | No | Yes | | Index | `( )` |
| **Lists** | for a collection of items, that can change (e.g. task lists, user inputs); <br> often used in loops and comprehensions for data manipulation | Yes | Yes | No | Index |  `[ ]` |
| **Sets** | to ensure unique elements; <br> to perform set operations (e.g. intersection, union, or difference) | Yes | No | Yes | *no indexing* | `{ }` or `set()` |
| **Dictionaries** | Key-value pairs, to map keys to corresponding values <br> for fast lookups and retrieval <br> (store data in key-value pairs, each key is unique within the dictionary) |Yes | No | Yes | Key | `{ }`|
| **Array** <br> (numpy) |  used for numerical computations and operations on large datasets <br> due to their efficiency and optimized implementations | Yes | Yes | No | Index | `numpy.array([])`|



**Sequence Types:** Tuples, Strings, Lists  
**Immutable Objects:** `int`, `float`, `complex`, `bool`, `string`, `tuple`  
**Mutable Objects:** `list`, `dict`, `set`  


**Difference between Numpy arrays and lists:**
- NumPy arrays can only contain elements of the same data type (e.g., integers, floats)
- Python lists can contain elements of different data types 
- NumPy arrays allow for more efficient storage and processing of numerical data
- NumPy arrays are more efficient in terms of memory usage and computational performance compared to Python lists, especially for large datasets
- NumPy arrays typically consume less memory than Python lists, especially for large datasets

# Object-oriented programming
*means of structuring programs so that properties and behaviors are bundled into individual objects*  
In Python, everything is an object.

## Object
* contains variables and methods
* collection of data (variables) and methods (functions) that operate on the data
* an instance of a class, which serves as a blueprint for creating objects
* used to model and represent real-world entities and concepts in a program.
* Object Oriented Design focuses on 
    * *Encapsulation*: packing of data and functions operating on that data into a single component and restricting the access to some of the object’s components.
    * *Inheritance*: allows defining a class (child class) that inherits all the methods and properties from another class.
    * *Polymorphism*: allows defining new methods in the child class with the same name as defined in their parent class (well... its much more complex than that, but enough for our use cases).

**Key Points:**

* **Instances of Classes**
    * Objects are instances of classes
    * A class defines the behavior and structure of objects, <br> including what data attributes they have and what methods can be called on them.
* **Attributes**
    * Objects have attributes, which are data stored within the object.
    * These attributes can be accessed and modified using dot notation (object.attribute).
    * Attributes can be variables, lists, dictionaries, functions, or any other data type.
* **Methods**
    * Objects can have methods, which are functions that operate on the object's data.
    * Methods are defined within the class and can be called on the object using dot notation (object.method()).
* **Identity**
    * Each object in Python has a unique identity, which can be obtained using the id() function.
    * This identity is guaranteed to be unique and constant for the lifetime of the object.
* **Type**
    * Every object in Python has a type, which specifies the class of the object.
    * The type of an object can be obtained using the type() function.
* **Dynamic Nature**
    * Python is a dynamically-typed language, which means that the type of an object is determined at runtime and can change during the execution of the program.
 

## Class

* A *class* is a special data type which defines how to build a certain kind of object which may contain *variables* and *methods*.
* *Objects* are instances of a *class* that are created following the definition given inside of the class.

# Method vs. Function


## Function
* block of code defined using the `def` keyword.
* can take zero or more arguments, perform some operations based on those arguments, and optionally return a value using the return statement
* can be called from anywhere in the code where they are in scope
* not associated with any specific object or class

## Method
`.blabla()`

* function that belongs to an object
* accessed using dot notation `(.)`
* called on an instance of a class
* defined within a class and are associated with instances (objects) of that class or with the class itself (in the case of class methods and static methods)
* can access and modify the data (attributes) of the object to which they belong
* can take zero or more arguments, just like functions

## Attribute
`.blabla`

# Selection vs. Indexing

## Indexing
* accessing data elements by their positions or labels
* process of accessing specific elements or subsets of data within a data structure (like arrays, lists, or DataFrames) using indices or labels
* different methods, such as integer-based indexing, label-based indexing, or boolean indexing.
* specify the position (integer index) or label of the data you want to access, and then retrieving that data from the data structure

## Selection
* choosing subsets of data based on certain conditions or criteria
* process of choosing or filtering specific elements or subsets of data based on certain criteria or conditions
* using logical conditions or filters to choose data that meets specific requirements
* In pandas, for example, selection can be done using methods like boolean indexing (DataFrame[condition]), query method, or functions like `.loc()` and `.iloc()`

# Libraries

| library       | key tools                          | use                                                              |
|---------------|------------------------------------|------------------------------------------------------------------|
| Numpy         | Arrays, Matrices                   | linear algebra, efficient numerical computing                    |
| Pandas        | Series, DataFrames                 | handling data: manipulation, cleaning, analysis, (visualization) |
| Matplotlib    |                                    | plots: customizable, simple plots, high-quality output           |
| Seaborn       |                                    | statistical plotting, works with pandas, less customizable       | 
| Scikit Learn  |                                    | machine learning, predictive modeling, data mining, data analysis|
| BeautifulSoup |





## Matplotlib vs Seaborn

### Matplotlib
* Low-level library: powerful, flexible library providing complete control over the creation of plots and visualizations
* Wide range of plots: including line plots, scatter plots, bar plots, histograms, heatmaps, and more
* Customization: allows detailed customization of every aspect of a plot, such as colors, line styles, labels, annotations, and axes
* Support for multiple backends: interactive interfaces for Jupyter notebooks and web applications, as well as saving plots in various formats (e.g., PNG, PDF, SVG).
* Mature and widely used: Matplotlib is a mature library with a large community and extensive documentation


### Seaborn
* Statistical visualization library: *built on top of Matplotlib* and provides a higher-level interface for creating attractive and informative statistical graphics
* Default styles and color palettes: comes with built-in themes and color palettes that make it easy to create visually appealing plots with minimal customization
* Specialized plots: offers specialized functions for visualizing statistical relationships, such as scatter plots with regression lines (lmplot), box plots (boxplot), violin plots (violinplot), and pair plots (pairplot)
* Integration with pandas DataFrames: easy plotting of data stored in pandas structures
* Less control but faster workflow: While Seaborn sacrifices some of the fine-grained control offered by Matplotlib, it provides a faster and more intuitive workflow for creating common types of statistical visualizations

### *Which one to choose?*

Use **Matplotlib** if you need complete control over your plots, want to create complex or custom visualizations, or require compatibility with a wide range of backends.  
Use **Seaborn** if you're primarily working with statistical data and want to quickly create attractive, informative visualizations with minimal customization, especially if you're already using pandas for data manipulation.
