<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Python_Data_Analytics_Course/blob/main/1_Basics/20_Library.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Libraries

## Overview

### Notes
* A library is a collection of packages or modules that are grouped together to extend functionality
* Types:
  1. Standard libraries (included with Python)
  2. Third-party libraries & packages (need to be installed)
* Typically, installed using package managers like pip or conda

### What the heck are Packages then?
- A package can include modules and sub-packages 

### Packages vs Libraries
- Python uses packages to organize inside it's libraries, and libraries to provide and extend functionalities.   
- While **all libraries can be considered packages** (if they are structured that way), **not all packages are libraries**.

### Importance 
* It helps save time, standardize processes, and add complex functionalities easily.
* Used for things like data analysis, machine learning, web development, automation, and more.


## Standard Python Libraries

### Notes

* The same as modules, we use the `from` and `import` keywords. 
    * The`import` gets the module modules
    * The `from` is to get specific attributes from a module directly
* When to use one or the other: 
    * `import module_name`: When you need to access several functions or attributes from a module
    * `from module_name import attribute_name`: Use when you only need a specific function or attribute from a module

### Importance of Libraries

### Notes

What if we want to be able to manipulate files? We can use the standard library in Python to open, read, write, and close files.

Functions:
- `open`: Opens a file. The 'r' mode is for reading, 'w' for writing (overwrites content), 'a' for appending, and 'b' for binary mode.
- `read`: Reads the content of an opened file. Can also use `readline()` for a single line or `readlines()` for all lines as a list.
- `write`: Writes a string to an opened file. If the file is opened in append mode ('a'), the text is added at the end.
- `close`: Closes an opened file, which is important for freeing up system resources.

### Examples

**Note: This example is meant to be done on Google Colab with accesses to the `sample_data` folder**.

To read the contents of `california_housing_test.csv` and print them:

In [None]:
# Open the file in read mode
file = open('sample_data/california_housing_test.csv')

# Read the file
content = file.read()

# Close the file
file.close()

# Print the content
print(content)

Now, what if we want to get this data into something usable (like a dictionary) to actually start performing data analysis:

In [None]:
import csv

data_dict = {}
for index, row in enumerate(csv.reader(content.strip().split('\n'))):
    if index == 0:
        # Initialize dictionary with column headers as keys
        for column in row:
            data_dict[column] = []
    else:
        # Append each element in the row to the correct list in the dictionary
        for col_index, value in enumerate(row):
            data_dict[list(data_dict.keys())[col_index]].append(value)

# Print the dictionary to verify contents
print(data_dict)


Okay, that's honestly a lot to do each time you want to work with CSV files. But we could use a third-party library called `pandas` instead to load in files a lot easier.

## Third-Party Libraries

### Example w/ Pandas

With a library like `pandas`, you can read the file and convert it in 3 lines of code!

In [None]:
import pandas as pd

# Create a dataframe from csv file
df = pd.read_csv('sample_data/california_housing_test.csv')

# Print the dataframe
df

# get sume of total_bedrooms
sum_total_bedrooms = df['total_bedrooms'].sum()

### Notes

* These are third party packages and libraries (not included in Python Standard Library) that need to be installed separately.
* The way you install a package depends on the package manager you are using
* There are two main ways to do this:
    1. *Pip* - If you use `pip` for package management (Google Colab uses this).
    2. *Conda* - If you use `conda` or `mamba` for package management (We'll use this in the Advanced Chapter).

**NOTE: We'll go more into package management in the advanced chapter.**

### Common Third Party Libraries

Below are some common third party libraries used in data science:
- **Pandas**: Offers data structures and operations for manipulating numerical tables and time series.
- **NumPy**: Supports large, multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions.
- **Matplotlib**: A plotting library for creating static, animated, and interactive visualizations in Python.
- **Seaborn**: Provides a high-level interface for drawing attractive and informative statistical graphics.
- **SciPy**: Used for scientific and technical computing, offering modules for optimization, linear algebra, and more.
- **Scikit-learn**: Implements a range of machine learning algorithms for data mining, data analysis, and machine learning tasks.
- **Plotly**: Creates interactive and visually appealing graphs for web publication.

### Where do I find packages?

[PyPi](https://pypi.org/) - for `pip`  
[Anaconda](https://www.anaconda.com/) - for `conda`

### Listing Packages Installed

If you're running this in Google Collab, use the following:

In [4]:
!pip list

Package             Version
------------------- --------
aiohttp             3.9.5
aiosignal           1.3.1
appnope             0.1.4
asttokens           2.4.1
async-timeout       4.0.3
attrs               23.2.0
backcall            0.2.0
Brotli              1.1.0
certifi             2024.2.2
charset-normalizer  3.3.2
colorama            0.4.6
comm                0.2.2
contourpy           1.2.1
cycler              0.12.1
datasets            2.19.0
debugpy             1.8.1
decorator           5.1.1
dill                0.3.8
executing           2.0.1
filelock            3.14.0
fonttools           4.51.0
frozenlist          1.4.1
fsspec              2024.3.1
huggingface_hub     0.22.2
idna                3.7
importlib_metadata  7.1.0
importlib_resources 6.4.0
ipykernel           6.29.3
ipython             8.12.0
jedi                0.19.1
jupyter_client      8.6.1
jupyter_core        4.12.0
kiwisolver          1.4.5
matplotlib          3.8.4
matplotlib-inline   0.1.6
multidict          

If you're running this locally using Conda run this:

In [5]:
!conda list

# packages in environment at /Users/lukebarousse/opt/anaconda3/envs/python_course:
#
# Name                    Version                   Build  Channel
aiohttp                   3.9.5            py39ha09f3b3_0    conda-forge
aiosignal                 1.3.1              pyhd8ed1ab_0    conda-forge
appnope                   0.1.4              pyhd8ed1ab_0    conda-forge
asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
async-timeout             4.0.3              pyhd8ed1ab_0    conda-forge
attrs                     23.2.0             pyh71513ae_0    conda-forge
aws-c-auth                0.7.18               hb47d15a_0    conda-forge
aws-c-cal                 0.6.11               hbce485b_0    conda-forge
aws-c-common              0.9.15               h10d778d_0    conda-forge
aws-c-compression         0.2.18               h53e3db5_3    conda-forge
aws-c-event-stream        0.4.2                he461af8_8    conda-forge
aws-c-http                0.8.1              

Note: `!pip list` will work if running conda; BUT it won't include all your packages.

### Installing Packages

#### Notes

- Once again dependent on your package manager will depend on whether you need to pip or conda install.

#### Examples

##### `pip` Install (Google Colab example)

Google Colab comes standard with a bunch of libraries installed; but here's one we don't have:

In [None]:
!pip install pyjokes

##### `conda` install (Local Example) 

**Note: This is an example ONLY if you are running locally and have Anaconda installed (on Colab, pandas is already installed)**.

Since `pandas` is a library outside of Python's standard library; so we can install with conda.

In [None]:
!conda install pandas

## Import the Package

Now that we've installed a library we need to import it. This lets us use it in our specific notebook / environment (we'll get more into environments later in the advanced section).

We will be showing how to import Python libraries, packages and modules. Here's a reminder of the difference between all 3:
1. **Library**: A collection of packages and modules.
2. **Package**: A directory with Python scripts and an `__init__.py` file.
3. **Module**: A Python script file that can be imported.

### Examples

In [None]:
import pyjokes

pyjokes.get_joke()