## **7. Using Files as Input: Reading Data from the Outside World**

So far, all the data we've worked with has been created directly inside our code (e.g., `start_price = 225.40`). In any real-world business or finance scenario, this is rarely the case. Data lives in files: transaction logs, market data exports, configuration settings, and reports.

To build powerful applications, our Python scripts must be able to **read** data from these files. This process is called **File I/O** (Input/Output). We'll learn how to navigate the file system to find our files and how to parse different file formats, from simple text to structured data like JSON and YAML.

### **7.1 Navigating Your Computer: The `os` Module**
Before you can open a file, you need to know where it is. The `os` module in Python provides a way to interact with the operating system, allowing you to work with file paths and directories.

**Key Functions:**
*   `os.getcwd()`: **G**et **C**urrent **W**orking **D**irectory. Shows where your script is currently running from.
*   `os.listdir(path)`: Lists all files and folders in a given directory path. This is key for processing multiple files in a folder.
*   `os.path.exists(path)`: Returns `True` if a file or directory exists at the given path, `False` otherwise. Crucial for checking if a file is available before trying to open it.
*   `os.path.join(path1, path2, ...)`: Intelligently combines path components. It automatically uses the correct separator (`/` or `\`) for your operating system, making your code portable.
*   `os.makedirs(path, exist_ok=True)`: Creates a directory. `exist_ok=True` prevents an error if the directory already exists.

In [2]:
import os

# Find out where we are
current_directory = os.getcwd()
print(f"Current Directory: {current_directory}")

# Build a path to a folder in a safe way
data_folder_path = os.path.join(current_directory, '..')
print(f"Constructed Path: {data_folder_path}")

# Safely create the directory if it doesn't exist
os.makedirs(data_folder_path, exist_ok=True)

# List what's in the current directory
print(f"Contents of '.': {os.listdir('.')}") # '.' is shorthand for the current directory

# Check if our new folder exists
print(f"Does the 'data' folder exist? {os.path.exists(data_folder_path)}")

Current Directory: /mnt/c/Users/gabol/Desktop/Audencia/PythonForFinance/tds/3.StructuresLoopsFunctions
Constructed Path: /mnt/c/Users/gabol/Desktop/Audencia/PythonForFinance/tds/3.StructuresLoopsFunctions/..
Contents of '.': ['4.ListSetDictTuples.ipynb', '4bis.4.ListSetDictTuplesProblem.ipynb', '5.ForWhileIf.ipynb', '5bis.ForWhileIfProblem.ipynb', '6.FunctionClasses.ipynb', '6bis.FunctionClassesProblem.ipynb', '7. FileSytem.ipynb', 'data', 'finance_tools.py', '__pycache__']
Does the 'data' folder exist? True


#### **Practice Session: `os` Module**
1.  **Where Am I?**: Import the `os` module and print your current working directory.
2.  **Make a Home for Data**: Create a new directory called `reports` using `os.makedirs()`.
3.  **File Detective**: Check if a file named `important_data.csv` exists in your current directory. (It probably doesn't, so the output should be `False`).
4.  **Path Builder**: Construct the full path to a file called `q1_summary.txt` that would live inside your new `reports` directory, using `os.path.join()`.

### **7.2 Reading Plain Text Files (`.txt`)**
The simplest type of file is a plain text file, often used for logs or simple lists. The best practice for handling files is using the `with open(...)` statement. It creates a temporary context and **automatically closes the file for you** when you are done, even if an error occurs.

**Syntax:**
```python
with open('path/to/your/file.txt', 'r') as file_object:
    # Code to work with the file goes here
    # The file is automatically closed after this block
```
*   `'r'`: The **mode** for opening the file. `'r'` is for **read**. Other common modes are `'w'` (write, which overwrites the file) and `'a'` (append, which adds to the end of the file).

#### **Methods for Reading Text Files:**
Once the file is open, you can read its contents in several ways.
```python
# Assume we have a file 'tickers.txt' with the content:
# AAPL
# MSFT
# GOOG

# Method 1: Looping line by line (Best for large files)
print("--- Looping Line by Line ---")
with open('tickers.txt', 'r') as f:
    for line in f:
        # .strip() removes whitespace, including the newline character
        print(f"Processing ticker: {line.strip()}")

# Method 2: Read all lines into a list of strings
with open('tickers.txt', 'r') as f:
    lines_list = f.readlines()
    # print(lines_list) # Result: ['AAPL\n', 'MSFT\n', 'GOOG\n']
```

#### **Practice Session: Reading `.txt`**
First, let's create a file to work with. Run the following code cell to create `portfolio.txt`:
```python
with open('portfolio.txt', 'w') as f:
    f.write("MSFT,100,330.50\n")
    f.write("AAPL,150,175.25\n")
    f.write("JPM,200,155.20\n")
```

Now, complete these tasks:
1.  **Read and Print**: Open `portfolio.txt` and use a `for` loop to read and print each line.
2.  **Clean Data**: Modify your loop from step 1. For each line, use `.strip()` to remove the newline character and then `.split(',')` to turn the string into a list (e.g., `['MSFT', '100', '330.50']`). Print the resulting list for each line.
3.  **Calculate Total Shares**: Building on step 2, initialize a `total_shares` variable to 0. Inside your loop, access the number of shares (it will be a string, so you'll need to convert it to an `int`), and add it to `total_shares`. After the loop, print the final total.

### **7.3 Reading Structured Data: JSON (`.json`)**
Most modern data isn't just a list of lines; it's **structured**. JSON (JavaScript Object Notation) is the most common format for structured data exchange. It maps almost perfectly to Python dictionaries. Python's built-in `json` module makes it incredibly easy to parse a JSON file directly into a Python dictionary.

**Example `config.json` file:**
```json
{
  "portfolio_name": "Client Growth Portfolio",
  "analyst": "John Doe",
  "risk_level": "moderate",
  "watch_list": ["NVDA", "TSLA", "AMZN"]
}
```

**Reading the JSON file:**
```python
import json

with open('config.json', 'r') as f:
    # Use json.load() to parse the file object
    config_data = json.load(f)

# config_data is now a standard Python dictionary!
print(type(config_data))
print(f"Portfolio Name: {config_data['portfolio_name']}")
```

#### **Practice Session: Reading `.json`**
First, run this cell to create `stock_profile.json`:
```python
import json

data = {
    "ticker": "NVDA",
    "company_name": "NVIDIA Corporation",
    "sector": "Technology",
    "last_price": 450.75,
    "is_profitable": True
}
with open('stock_profile.json', 'w') as f:
    json.dump(data, f, indent=2) # json.dump() writes a dict to a file
```
1.  **Load the Profile**: Import the `json` module. Open `stock_profile.json` and load its contents into a variable called `profile`.
2.  **Access Data**: Print the company's name and its sector from the `profile` dictionary.
3.  **Conditional Check**: Use an `if` statement to check if the value of the `"is_profitable"` key is `True`. If it is, print a confirmation message.

### **7.4 Reading Structured Data: YAML (`.yaml`)**
YAML (YAML Ain't Markup Language) is another popular format for structured data, prized for being more human-readable than JSON. It's very common for configuration files in data science and software development. Python does not have a built-in YAML parser, so you need to install a third-party library.

**Installation:**
```python
!pip install pyyaml
```

**Example `settings.yaml` file:**
```yaml
report_settings:
  title: "Quarterly Performance Review"
  include_charts: true
database:
  host: "prod.db.server.com"
  port: 5432
```

**Reading the YAML file:**
```python
import yaml

with open('settings.yaml', 'r') as f:
    # Use yaml.safe_load() to avoid potential security risks
    settings_data = yaml.safe_load(f)

# settings_data is also a standard Python dictionary!
print(type(settings_data))
print(f"Report Title: {settings_data['report_settings']['title']}")
```

#### **Practice Session: Reading `.yaml`**
First, run this cell to create `model_params.yaml`:
```python
# Note: You don't need to import yaml to write a yaml string
yaml_content = """
model_name: "Linear Regression"
version: 1.2
parameters:
  learning_rate: 0.01
features:
  - "market_cap"
  - "pe_ratio"
"""
with open('model_params.yaml', 'w') as f:
    f.write(yaml_content)
```
1.  **Install Library**: If you haven't already, run `!pip install pyyaml`.
2.  **Load Parameters**: Import the `yaml` module. Open `model_params.yaml` and load its contents into a variable called `params`.
3.  **Access Nested Data**: Print the `learning_rate` from the loaded data.
4.  **List Features**: Access the list of `features` and use a `for` loop to print each feature.