# Conda Environment Management Practice Lab**Estimated Time:** 45-60 minutes  **Prerequisites:** Conda or Miniconda installed---## Learning ObjectivesBy completing this lab, you will:- ✅ Create and manage Conda environments- ✅ Install packages with conda and pip- ✅ Export and recreate environments- ✅ Handle environment dependencies- ✅ Set up environments for AI/ML development- ✅ Understand Python 3.10+, PyTorch 2.6.0+, CUDA 12.4+ requirements

---## Exercise 1: Verify Conda InstallationFirst, let's check that Conda is properly installed and up to date.

In [None]:
# Check Conda version!conda --version

In [None]:
# View Conda configuration!conda info

In [None]:
# List all existing environments!conda env list

---## Exercise 2: Create Your First EnvironmentCreate a basic Python environment with a specific Python version.

In [None]:
# Create environment with Python 3.10!conda create -n basic-env python=3.10 -y

In [None]:
# List environments to verify creation!conda env list

In [None]:
# View packages in the new environment!conda list -n basic-env

**Note:** To activate an environment in terminal, use:```bashconda activate basic-env```In Jupyter notebooks, we use `!conda` commands with the `-n` flag to specify the environment.

---## Exercise 3: Install PackagesLearn to install packages using conda and pip.

In [None]:
# Install common data science packages!conda install -n basic-env numpy pandas matplotlib -y

In [None]:
# Check installed packages!conda list -n basic-env | grep -E '(numpy|pandas|matplotlib)'

In [None]:
# Install a package with pip (inside conda environment)!conda run -n basic-env pip install requests

In [None]:
# Verify pip installation!conda run -n basic-env pip list | grep requests

---## Exercise 4: Create an AI/ML EnvironmentSet up a proper environment for AI development with Python 3.10+, PyTorch 2.6.0+, and CUDA 12.4+ support.

In [None]:
# Create AI/ML environment with Python 3.10+!conda create -n ai-dev python=3.10 -y

In [None]:
# Install PyTorch 2.6.0+ with CUDA 12.4+ support# Note: Adjust based on your hardware (CPU-only or GPU)# For GPU with CUDA 12.4+:!conda install -n ai-dev pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia -y# For CPU-only (if no GPU):# !conda install -n ai-dev pytorch torchvision torchaudio cpuonly -c pytorch -y

In [None]:
# Install additional ML libraries!conda install -n ai-dev numpy pandas scikit-learn matplotlib seaborn jupyter -y

In [None]:
# Install transformers and other AI libraries via pip!conda run -n ai-dev pip install transformers huggingface-hub datasets accelerate

In [None]:
# Verify PyTorch installation and CUDA availability!conda run -n ai-dev python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}'); print(f'CUDA version: {torch.version.cuda if torch.cuda.is_available() else "N/A"}');"

---## Exercise 5: Environment Export and RecreationLearn to export environments for reproducibility.

In [None]:
# Export environment to YAML file!conda env export -n ai-dev > ai-dev-environment.yml

In [None]:
# View the exported file!head -30 ai-dev-environment.yml

In [None]:
# Create a simplified requirements file (pip-style)!conda run -n ai-dev pip freeze > ai-dev-requirements.txt

In [None]:
# View requirements file!head -20 ai-dev-requirements.txt

**To recreate environment from YAML:**```bashconda env create -f ai-dev-environment.yml```**To install from requirements.txt:**```bashconda activate ai-devpip install -r ai-dev-requirements.txt```

---## Exercise 6: Managing Multiple EnvironmentsPractice working with multiple specialized environments.

In [None]:
# Create a testing environment!conda create -n testing-env python=3.11 pytest pytest-cov black flake8 -y

In [None]:
# Create a data processing environment!conda create -n data-proc python=3.10 pandas numpy scipy polars dask -y

In [None]:
# List all environments!conda env list

In [None]:
# Compare package versions across environmentsprint("=== basic-env Python version ===")!conda run -n basic-env python --versionprint("\n=== ai-dev Python version ===")!conda run -n ai-dev python --versionprint("\n=== testing-env Python version ===")!conda run -n testing-env python --version

---## Exercise 7: Environment TroubleshootingLearn to diagnose and fix common issues.

In [None]:
# Check for broken dependencies!conda run -n ai-dev conda list --show-channel-urls

In [None]:
# Update conda itself!conda update conda -y

In [None]:
# Update all packages in an environment (use cautiously)# !conda update -n basic-env --all -y

In [None]:
# Clean up cached packages!conda clean --all -y

---## Exercise 8: Jupyter Kernel RegistrationRegister conda environments as Jupyter kernels.

In [None]:
# Install ipykernel in the ai-dev environment!conda install -n ai-dev ipykernel -y

In [None]:
# Register as Jupyter kernel!conda run -n ai-dev python -m ipykernel install --user --name ai-dev --display-name "Python (AI-Dev)"

In [None]:
# List available kernels!jupyter kernelspec list

**Note:** After registering, restart Jupyter and you'll see "Python (AI-Dev)" as a kernel option.

---## Exercise 9: Environment CleanupLearn to remove environments you no longer need.

In [None]:
# Remove specific packages from an environment!conda remove -n basic-env matplotlib -y

In [None]:
# Remove an entire environment (be careful!)# Uncomment to actually remove# !conda env remove -n testing-env -y

In [None]:
# Verify environments after cleanup!conda env list

---## Exercise 10: Best Practices VerificationCheck that your setup follows best practices.

In [None]:
import subprocessimport jsondef check_environment_setup():    """Verify best practices for conda environments"""    checks = []        # Check 1: Python version 3.10+    result = subprocess.run(        ['conda', 'run', '-n', 'ai-dev', 'python', '--version'],        capture_output=True, text=True    )    python_version = result.stdout.strip().split()[-1]    major, minor = map(int, python_version.split('.')[:2])    checks.append({        'name': 'Python 3.10+',        'passed': major == 3 and minor >= 10,        'value': python_version    })        # Check 2: PyTorch installed    result = subprocess.run(        ['conda', 'run', '-n', 'ai-dev', 'python', '-c',          'import torch; print(torch.__version__)'],        capture_output=True, text=True    )    pytorch_installed = result.returncode == 0    pytorch_version = result.stdout.strip() if pytorch_installed else 'Not installed'    checks.append({        'name': 'PyTorch installed',        'passed': pytorch_installed,        'value': pytorch_version    })        # Check 3: Multiple environments exist    result = subprocess.run(        ['conda', 'env', 'list', '--json'],        capture_output=True, text=True    )    env_data = json.loads(result.stdout)    num_envs = len([e for e in env_data['envs'] if 'envs' in e])    checks.append({        'name': 'Multiple environments',        'passed': num_envs >= 2,        'value': f'{num_envs} environments'    })        # Print results    print("\n=== Environment Setup Verification ===")    for check in checks:        status = "✅" if check['passed'] else "❌"        print(f"{status} {check['name']}: {check['value']}")        passed = sum(c['passed'] for c in checks)    total = len(checks)    print(f"\n📊 Results: {passed}/{total} checks passed")        if passed == total:        print("🎉 Your conda setup follows best practices!")    else:        print("⚠️  Some improvements recommended.")check_environment_setup()

---## Exercise 11: Real-World Data Analysis - Titanic DatasetPractice using pandas in a conda environment with the famous Kaggle Titanic dataset.**Learning Goals:**- Use pandas for data exploration- Work with real-world datasets- Practice data analysis in conda environment- Prepare for machine learning workflows

In [None]:
# First, ensure we're using the ai-dev environment with pandas# Check if pandas is installed in our environment!conda list -n ai-dev | grep pandas

In [None]:
# If pandas is not installed, install it# !conda install -n ai-dev pandas -y# For this exercise, we'll use Python directlyimport sysprint(f"Python executable: {sys.executable}")print(f"Python version: {sys.version}")

### Step 1: Load the Titanic DatasetWe'll use the Titanic dataset which is commonly available through seaborn or can be downloaded from Kaggle.For this exercise, we'll use seaborn's built-in version.

In [None]:
# Import required librariesimport pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as sns# Set visualization stylesns.set_style('whitegrid')%matplotlib inlineprint("✅ Libraries imported successfully")

In [None]:
# Load the Titanic dataset from seaborntitanic = sns.load_dataset('titanic')print(f"Dataset shape: {titanic.shape}")print(f"Rows: {titanic.shape[0]}, Columns: {titanic.shape[1]}")print("\n✅ Titanic dataset loaded\!")

### Step 2: Initial Data ExplorationLet's explore the structure and contents of the dataset.

In [None]:
# Display first few rowsprint("=== First 5 Rows ===")titanic.head()

In [None]:
# Dataset informationprint("=== Dataset Info ===")titanic.info()

In [None]:
# Column names and typesprint("=== Column Names and Types ===")print(titanic.dtypes)print(f"\nTotal columns: {len(titanic.columns)}")

In [None]:
# Check for missing valuesprint("=== Missing Values ===")missing = titanic.isnull().sum()missing_pct = (missing / len(titanic) * 100).round(2)missing_df = pd.DataFrame({    'Missing Count': missing,    'Percentage': missing_pct})print(missing_df[missing_df['Missing Count'] > 0].sort_values('Missing Count', ascending=False))

### Step 3: Basic StatisticsCalculate summary statistics for the dataset.

In [None]:
# Numerical columns statisticsprint("=== Numerical Statistics ===")titanic.describe()

In [None]:
# Categorical columns statisticsprint("=== Categorical Statistics ===")categorical_cols = ['survived', 'pclass', 'sex', 'embarked']for col in categorical_cols:    if col in titanic.columns:        print(f"\n{col.upper()}:")        print(titanic[col].value_counts())        print(f"Unique values: {titanic[col].nunique()}")

### Step 4: Data Analysis QuestionsLet's answer some interesting questions about the Titanic passengers.

In [None]:
# Question 1: What was the overall survival rate?survival_rate = (titanic['survived'].sum() / len(titanic) * 100).round(2)print(f"Overall Survival Rate: {survival_rate}%")print(f"Survived: {titanic['survived'].sum()}")print(f"Died: {len(titanic) - titanic['survived'].sum()}")

In [None]:
# Question 2: How did survival vary by gender?print("=== Survival by Gender ===")gender_survival = titanic.groupby('sex')['survived'].agg(['sum', 'count', 'mean'])gender_survival['survival_rate_%'] = (gender_survival['mean'] * 100).round(2)print(gender_survival)print("\nConclusion:")print(f"Female survival rate: {gender_survival.loc['female', 'survival_rate_%']:.1f}%")print(f"Male survival rate: {gender_survival.loc['male', 'survival_rate_%']:.1f}%")

In [None]:
# Question 3: How did passenger class affect survival?print("=== Survival by Passenger Class ===")class_survival = titanic.groupby('pclass')['survived'].agg(['sum', 'count', 'mean'])class_survival['survival_rate_%'] = (class_survival['mean'] * 100).round(2)class_survival.index.name = 'Class'print(class_survival)

In [None]:
# Question 4: What was the age distribution?print("=== Age Statistics ===")print(f"Average age: {titanic['age'].mean():.1f} years")print(f"Median age: {titanic['age'].median():.1f} years")print(f"Youngest passenger: {titanic['age'].min():.1f} years")print(f"Oldest passenger: {titanic['age'].max():.1f} years")print(f"Missing age values: {titanic['age'].isnull().sum()}") 

### Step 5: Data VisualizationCreate visualizations to better understand the data.

In [None]:
# Visualization 1: Survival by Genderplt.figure(figsize=(10, 6))survival_by_gender = titanic.groupby(['sex', 'survived']).size().unstack()survival_by_gender.plot(kind='bar', stacked=False, color=['#d62728', '#2ca02c'])plt.title('Survival Count by Gender', fontsize=14, fontweight='bold')plt.xlabel('Gender')plt.ylabel('Number of Passengers')plt.legend(['Did Not Survive', 'Survived'], loc='upper right')plt.xticks(rotation=0)plt.tight_layout()plt.show()print("✅ Gender survival visualization complete")

In [None]:
# Visualization 2: Survival by Classplt.figure(figsize=(10, 6))survival_by_class = titanic.groupby(['pclass', 'survived']).size().unstack()survival_by_class.plot(kind='bar', color=['#d62728', '#2ca02c'])plt.title('Survival Count by Passenger Class', fontsize=14, fontweight='bold')plt.xlabel('Passenger Class (1=First, 2=Second, 3=Third)')plt.ylabel('Number of Passengers')plt.legend(['Did Not Survive', 'Survived'], loc='upper right')plt.xticks(rotation=0)plt.tight_layout()plt.show()print("✅ Class survival visualization complete")

In [None]:
# Visualization 3: Age Distributionfig, axes = plt.subplots(1, 2, figsize=(14, 5))# Age histogramaxes[0].hist(titanic['age'].dropna(), bins=30, color='steelblue', edgecolor='black', alpha=0.7)axes[0].axvline(titanic['age'].mean(), color='red', linestyle='--', linewidth=2, label=f"Mean: {titanic['age'].mean():.1f}")axes[0].set_title('Age Distribution of Passengers', fontsize=12, fontweight='bold')axes[0].set_xlabel('Age (years)')axes[0].set_ylabel('Frequency')axes[0].legend()axes[0].grid(True, alpha=0.3)# Age distribution by survivalsurvived_ages = titanic[titanic['survived'] == 1]['age'].dropna()died_ages = titanic[titanic['survived'] == 0]['age'].dropna()axes[1].hist([died_ages, survived_ages], bins=30, label=['Did Not Survive', 'Survived'],              color=['#d62728', '#2ca02c'], alpha=0.7)axes[1].set_title('Age Distribution by Survival', fontsize=12, fontweight='bold')axes[1].set_xlabel('Age (years)')axes[1].set_ylabel('Frequency')axes[1].legend()axes[1].grid(True, alpha=0.3)plt.tight_layout()plt.show()print("✅ Age distribution visualization complete")

In [None]:
# Visualization 4: Comprehensive heatmapplt.figure(figsize=(12, 8))# Create a pivot table for survival ratesurvival_matrix = titanic.pivot_table(    values='survived',    index='pclass',    columns='sex',    aggfunc='mean')sns.heatmap(survival_matrix, annot=True, fmt='.2%', cmap='RdYlGn',             cbar_kws={'label': 'Survival Rate'}, linewidths=2)plt.title('Survival Rate by Class and Gender', fontsize=14, fontweight='bold')plt.xlabel('Gender')plt.ylabel('Passenger Class')plt.tight_layout()plt.show()print("✅ Heatmap visualization complete")

### Step 6: Advanced AnalysisCreate derived features and perform deeper analysis.

In [None]:
# Create age groupstitanic['age_group'] = pd.cut(titanic['age'],                               bins=[0, 12, 18, 35, 60, 100],                              labels=['Child', 'Teen', 'Young Adult', 'Adult', 'Senior'])print("=== Survival Rate by Age Group ===")age_group_survival = titanic.groupby('age_group')['survived'].agg(['mean', 'count'])age_group_survival['survival_rate_%'] = (age_group_survival['mean'] * 100).round(2)print(age_group_survival)

In [None]:
# Analyze family size impacttitanic['family_size'] = titanic['sibsp'] + titanic['parch'] + 1titanic['is_alone'] = (titanic['family_size'] == 1).astype(int)print("=== Family Size Analysis ===")print(f"Average family size: {titanic['family_size'].mean():.2f}")print(f"Passengers traveling alone: {titanic['is_alone'].sum()}")print(f"Passengers with family: {len(titanic) - titanic['is_alone'].sum()}")print("\n=== Survival by Family Status ===")family_survival = titanic.groupby('is_alone')['survived'].agg(['mean', 'count'])family_survival['survival_rate_%'] = (family_survival['mean'] * 100).round(2)family_survival.index = ['With Family', 'Alone']print(family_survival)

In [None]:
# Fare analysisprint("=== Fare Statistics ===")print(f"Average fare: ${titanic['fare'].mean():.2f}")print(f"Median fare: ${titanic['fare'].median():.2f}")print(f"Maximum fare: ${titanic['fare'].max():.2f}")# Fare by classprint("\n=== Average Fare by Class ===")fare_by_class = titanic.groupby('pclass')['fare'].agg(['mean', 'median', 'std'])fare_by_class.columns = ['Average', 'Median', 'Std Dev']print(fare_by_class.round(2))

### Step 7: Export ResultsSave our analysis results for future use.

In [None]:
# Create a summary DataFramesummary_stats = {    'Total Passengers': len(titanic),    'Survival Rate': f"{(titanic['survived'].mean() * 100):.2f}%",    'Average Age': f"{titanic['age'].mean():.1f} years",    'Female Survival Rate': f"{(titanic[titanic['sex']=='female']['survived'].mean() * 100):.2f}%",    'Male Survival Rate': f"{(titanic[titanic['sex']=='male']['survived'].mean() * 100):.2f}%",    'Class 1 Survival Rate': f"{(titanic[titanic['pclass']==1]['survived'].mean() * 100):.2f}%",    'Class 2 Survival Rate': f"{(titanic[titanic['pclass']==2]['survived'].mean() * 100):.2f}%",    'Class 3 Survival Rate': f"{(titanic[titanic['pclass']==3]['survived'].mean() * 100):.2f}%"}summary_df = pd.DataFrame.from_dict(summary_stats, orient='index', columns=['Value'])summary_df.index.name = 'Metric'print("=== Analysis Summary ===")print(summary_df)# Save to CSVsummary_df.to_csv('titanic_analysis_summary.csv')print("\n✅ Summary saved to titanic_analysis_summary.csv")

In [None]:
# Export cleaned dataset with new featuresexport_cols = ['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch',                'fare', 'embarked', 'family_size', 'is_alone', 'age_group']titanic_export = titanic[export_cols].copy()titanic_export.to_csv('titanic_processed.csv', index=False)print(f"✅ Processed dataset exported: {len(titanic_export)} rows, {len(titanic_export.columns)} columns")print(f"Saved to: titanic_processed.csv")

### Key Findings from Analysis**Summary of Insights:**1. **Overall Survival**: ~38% of passengers survived the disaster2. **Gender Impact**:   - Women had a significantly higher survival rate (~74%)   - Men had a much lower survival rate (~19%)   - "Women and children first" protocol was clearly followed3. **Class Disparity**:   - First-class passengers: ~63% survival rate   - Second-class passengers: ~47% survival rate     - Third-class passengers: ~24% survival rate   - Social class was a major factor in survival4. **Age Factor**:   - Children had relatively higher survival rates   - Average age of passengers was ~30 years5. **Family Impact**:   - Passengers with family had different survival patterns   - Being alone vs. with family affected survival chances### Skills Practiced✅ **Conda Environment Usage**: Worked with pandas in conda environment  ✅ **Data Loading**: Loaded dataset using seaborn  ✅ **Data Exploration**: Used pandas methods (head, info, describe)  ✅ **Data Cleaning**: Handled missing values  ✅ **Statistical Analysis**: Calculated survival rates and statistics  ✅ **Data Visualization**: Created multiple plot types  ✅ **Feature Engineering**: Created derived features (age_group, family_size)  ✅ **Data Export**: Saved results to CSV files  ### Next Steps for MLThis analysis prepares the data for machine learning:- Could build a classification model to predict survival- Feature engineering is already started- Missing values need proper handling- Could use scikit-learn, PyTorch, or TensorFlow

---## Summary and Key Takeaways### Commands Practiced- `conda create` - Create new environments- `conda install` - Install packages- `conda list` - List packages- `conda env list` - List environments- `conda env export` - Export environment- `conda env create` - Create from file- `conda run` - Run commands in environment- `conda remove` - Remove packages/environments- `conda clean` - Clean cached files### Technology Requirements Met- ✅ **Python 3.10+**: Required for modern AI libraries- ✅ **PyTorch 2.6.0+**: Latest features and performance- ✅ **CUDA 12.4+**: GPU acceleration support- ✅ **Isolated environments**: Avoid dependency conflicts### Best Practices- ✅ Never install packages in base environment- ✅ Use specific Python versions for reproducibility- ✅ Export environments for sharing and deployment- ✅ Use conda for large packages, pip for pure Python- ✅ Register environments as Jupyter kernels- ✅ Clean up unused environments regularly### Common Issues Solved- Environment isolation prevents conflicts- Version pinning ensures reproducibility- Kernel registration enables Jupyter integration- Export/import enables team collaboration### Next Steps- Create project-specific environments- Set up CI/CD with environment files- Explore conda-forge channel- Learn about mamba (faster conda alternative)