# Q&A during my study

I decided to keep track of all my confusions and questions during my study, many of which are usually not answered in the cookbooks or tutorials I found. I guess the reason is that these questions tend to be too obvious or too trivial to people with technical background or experience. For beginners from another field (me), the entire world of IT poses a flow of questions that will annoy "experts".

The Q&A is not structured in any specific order, just a collection of my questions and answers. The answers are usually from my own understanding but fortified by AI tools like ChatGPT. I try to validate the answers by running code snippets or asking follow-up questions during the process so it's not just a blind trust in AI or any other source.



## About Python Basics (or NOT-So-Basics)

### What is the weird underscore and double underscore in Python?
In Python, a single underscore (_) before a variable name is a convention to indicate that the variable is intended for internal use. It is a hint to other programmers that the variable is "private" and should not be accessed directly from outside the class or module.

A double underscore (__) before a variable name triggers name mangling, which means that the interpreter changes the name of the variable in a way that makes it harder to create subclasses that accidentally override the private attributes and methods. This is a stronger indication that the variable is intended for internal use.

Here's a quick summary:

- `Single underscore (_)`: weak "internal use" indicator.

- `Double underscore (__)`: strong "internal use" indicator with name mangling.

However, it's important to note that these are just conventions and not enforced by the Python interpreter. People can still access these variables if they really want to, but it's generally considered bad practice to do so.

An extra note on import statements: when you import a module, all of its public attributes (those without a leading underscore) are accessible. However, the attributes with **a leading underscore** will **NOT** be imported.

### What is the `__init__.py` file in some python module or package?

It is used to indicate that the directory should be treated as a Python package. The presence of this file allows the package to be imported and used in other Python scripts.

The package structure may look like this:

```
my_package/
    __init__.py
    module1.py
    module2.py
    sub_package/
        __init__.py
        module3.py
```

Then, you can import the package and its modules in your Python scripts like this:

```python
from my_package import module1
from my_package.sub_package import module3
```

The content of `__init__.py` files can vary, but they often include package initialization code or import statements to make it easier to access submodules. For example, the `__init__.py` file in the `my_package` directory might look like this:

```
from .module1 import *
from .module2 import *
from .sub_package import *
```

This way, when you import `my_package`, you automatically get access to all the modules and sub-packages defined in the `__init__.py` file.


### What is the weird `__init__` code in some python script?

In Object-Oriented Programming (OOP), the `__init__` method is a special method used for initializing newly created objects. It is called automatically when a new instance of a class is created. The `__init__` method can take additional arguments to customize the initialization process.

Here's a simple example:

```python
class Dog:
    def __init__(self, name, age):
        self.name = name
        self.age = age

my_dog = Dog("Buddy", 3)
```

In this example, the `__init__` method initializes the `name` and `age` attributes of the `Dog` class when a new `Dog` object is created.

It is called every time a new instance of the class is created, although we can barely "feel" its presence when creating objects.

### What’s the difference between `load_dotenv()` and using `os` to read environment variables? Can they coexist or do you pick one?
load_dotenv() (from the python-dotenv library) loads variables from a .env file into the process environment (os.environ). After loading, you can read them with os.getenv() or os.environ.get().

Using os (e.g., os.environ or os.getenv()) reads variables that are already in the environment, whether they were set by the OS or loaded via load_dotenv().

They can coexist: call load_dotenv() first to load the .env file, then read values via os. If you don’t have a .env file, you can just use os to read whatever is already in the environment. In practice, using both is recommended for convenient management and access.

## About databases

## About data structures

### Difference between `tokenize` and `vectorize`?
- `tokenize` is the process of breaking down text into individual tokens (words, phrases, etc.), which is often the first step in text processing.

- `vectorize` is the process of converting these tokens into numerical vectors, which can be used for machine learning or other computational tasks. This often involves techniques like TF-IDF, word embeddings, or one-hot encoding. In TF-IDF, for example, each token is represented by a vector that reflects its importance in the document relative to a corpus of documents. One problem with TF-IDF is that it creates vector space that is too sparse because many tokens will not appear in every document, leading to many zero values in the vector representation.

### What is JSON anyway?
JSON (JavaScript Object Notation) is a lightweight data interchange format that is easy for humans to read and write (it is not). In python, for me at least, JSON looks like a dictionary with additional quotation marks. The reason people use JSON is that it can work across different programming languages and platforms, making it a common choice for data exchange in web applications and APIs. It is like something people use different languages to communicate with each other. 

For example, People around the world can read 0,1,2,3,4,5,6,7,8,9 and understand the meaning of these numbers despite the fact that they speak different languages. JSON is like a universal language for data exchange.

A simple example of JSON looks like this:

```json
{
    "name": "Alice",
    "age": 30,
    "city": "New York"
}
```

A JSON in python looks like this:

```python
import json

# some JSON (basically a string representation of a dictionary):
x =  '{ "name":"John", "age":30, "city":"New York"}'

# parse x, this will convert the JSON string into a Python dictionary
y = json.loads(x)

# the result is a Python dictionary:
print(y["age"])

```

To learn how to use JSON in Python, here is a simple and helpful resource from w3schools: [https://www.w3schools.com/python/python_json.asp](https://www.w3schools.com/python/python_json.asp)