# RAG Tools: Environment Setup

## 1. Introduction

This notebook focuses on setting up the basic environment for the RAG_tools project. We'll create a Conda environment, set up Docker containers for our databases, and create two essential utility tools: DockerComposeManager and config_utils.


## 2. Conda and Docker

We're using Conda for environment management and Docker for containerization. Conda allows us to create isolated Python environments, ensuring that our project's dependencies don't interfere with other projects or system-wide packages. This isolation makes it easier to reproduce our development environment across different machines.
Docker, on the other hand, provides a way to package our entire application, including its dependencies and configuration, into containers. These containers can run consistently across different environments, from development to production. Using Docker also allows us to easily manage and scale our database services.


## 2. Project Structure

Our initial project structure:

```
RAG_tools/
├── config/
│   ├── docker-compose.yml
│   └── example.env
├── notebooks/
│   ├── 00_Environment_Setup.ipynb
│   ├── 01_Project_Overview_and_Architecture.ipynb
│   ├── 02_Model_Selection_and_Rationale.ipynb
│   ├── 03_Chunking_Strategies.ipynb
│   ├── 04_Database_Architecture_and_Integration.ipynb
│   └── 05_Implementation_CodeBaseImporter.ipynb
├── src/
│   └── utils/
│       ├── config_utils.py
│       └── DockerComposeManager.py
└── tests/
```

## 3. Environment Setup

### 3.1 Create and Activate Conda Environment

First, let's create our Conda environment and activate it. From the command line Not in the Jypter notebook but from the CLI in your working directory run the following code to setup your conda environment.
```bash
conda create -n ragtools python=3.12 -y
conda init
conda activate ragtools
```

### 3.2 Configure Conda Channels

Next, we'll add the necessary Conda channels:



In [None]:

%%bash
conda config --add channels conda-forge
conda config --add channels pytorch




### 3.3 Install Dependencies

Now, let's install our dependencies, including pgvector:


In [None]:
%%bash
conda install transformers psycopg2 numpy matplotlib PyYAML jupyter pandas scikit-learn python-dotenv neo4j-python-driver docker-py ipykernel pgvector -c pytorch -c conda-forge -y
conda install pytorch torchvision torchaudio cpuonly -c pytorch -y
conda install -y python-dotenv python-docker -c conda-forge




### 3.4 Install Jupyter Kernel

To ensure we can use this environment in Jupyter notebooks, let's install the IPython kernel:


In [None]:
%%bash
python -m ipykernel install --user --name=ragtools


### 3.5 Switch to the New Kernel

After running these cells, you should restart your Jupyter kernel and select the 'ragtools' kernel to ensure you're working in the correct environment.

To do this in Jupyter Notebook:
1. Click on 'Kernel' in the top menu
2. Select 'Change kernel'
3. Choose 'ragtools' from the list

In JupyterLab:
1. Click on the kernel name in the top right corner of the notebook
2. Select 'ragtools' from the list of available kernels

After switching, you can verify the current environment by running:


In [None]:
import sys
print(sys.executable)



This should point to the Python interpreter in your 'ragtools' Conda environment.

### 3.6 Verify Installation

Let's verify that our environment is set up correctly:

In [2]:
import sys
import subprocess
import json
import importlib

def get_conda_package_info(package_name):
    try:
        result = subprocess.run(['conda', 'list', '--json'], capture_output=True, text=True)
        packages = json.loads(result.stdout)
        for package in packages:
            if package['name'] == package_name:
                return f"{package_name}: {package['version']}"
        return f"{package_name}: Not found in conda environment"
    except Exception as e:
        return f"Error getting conda info for {package_name}: {str(e)}"

def check_importable(package_name):
    import_names = {
        'pytorch': 'torch',
        'scikit-learn': 'sklearn',
        'python-dotenv': 'dotenv',
        'neo4j': 'neo4j',
        'docker': 'docker',
    }
    try:
        importlib.import_module(import_names.get(package_name, package_name))
        return "Importable"
    except ImportError:
        return "Not importable"

print(f"Python version: {sys.version}")
print("\nPackage versions:")

packages = ['pytorch', 'transformers', 'psycopg2', 'numpy', 'matplotlib', 
            'jupyter', 'pandas', 'scikit-learn', 'python-dotenv', 
            'neo4j-python-driver', 'docker-py', 'python-docker', 'pgvector']

for package in packages:
    conda_info = get_conda_package_info(package)
    import_name = package
    if package == 'neo4j-python-driver':
        import_name = 'neo4j'
    elif package in ['docker-py', 'python-docker']:
        import_name = 'docker'
    import_status = check_importable(import_name)
    print(f"{conda_info} ({import_status})")

# Additional check for PyTorch
try:
    import torch
    print(f"\nPyTorch import successful. Version: {torch.__version__}")
except ImportError:
    print("\nFailed to import PyTorch")

print("\nAll package checks completed.")


Python version: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:23:07) [GCC 12.3.0]

Package versions:
pytorch: 2.3.1 (Importable)
transformers: 4.41.2 (Importable)
psycopg2: 2.9.9 (Importable)
numpy: 2.0.0 (Importable)
matplotlib: 3.8.4 (Importable)
jupyter: 1.0.0 (Importable)
pandas: 2.2.2 (Importable)
scikit-learn: 1.5.1 (Importable)
python-dotenv: 1.0.1 (Importable)
neo4j-python-driver: 5.22.0 (Importable)
docker-py: 7.1.0 (Importable)
python-docker: 0.2.0 (Importable)
pgvector: 0.3.0 (Importable)

PyTorch import successful. Version: 2.3.1

All package checks completed.
