# Lesson 5.0: Complete Installation and Setup Guide
## Geoparsing and Sentiment Mapping in Python

This notebook will install all required packages and verify that everything works correctly before starting the main lesson.

## Step 1: Install Required Packages

**⚠️ Important:** Run this cell first and wait for it to complete. This may take 5-10 minutes.

In [None]:
# Install all required packages
import subprocess
import sys

packages = [
    'geoparser',
    'tqdm',
    'pandas', 
    'plotly',
    'mapclassify',
    'transformers',
    'torch',
    'spacy'
]

print("Installing required packages...")
for package in packages:
    print(f"Installing {package}...")
    subprocess.check_call([sys.executable, "-m", "pip", "install", package])
    
print("\n✅ All packages installed successfully!")

## Step 2: Download spaCy Language Model

The geoparser requires a specific spaCy language model for accurate text processing.

In [None]:
# Download the required spaCy model
print("Downloading spaCy language model...")
subprocess.check_call([sys.executable, "-m", "spacy", "download", "en_core_web_trf"])
print("\n✅ spaCy model downloaded successfully!")

## Step 3: Verify Installation

Let's test that all components are working correctly.

In [None]:
# Test imports
try:
    from geoparser import Geoparser
    from tqdm.notebook import tqdm
    import pandas as pd
    import plotly.express as px
    import mapclassify as mc
    import warnings
    warnings.simplefilter(action='ignore', category=FutureWarning)
    
    print("✅ All imports successful!")
    
except ImportError as e:
    print(f"❌ Import error: {e}")
    print("Please re-run the installation cells above.")

In [None]:
# Test geoparser initialization
try:
    print("Initializing geoparser... (this may take a minute)")
    geo = Geoparser(spacy_model='en_core_web_trf', 
                   transformer_model='dguzh/geo-all-distilroberta-v1', 
                   gazetteer='geonames')
    
    # Quick test
    test_docs = geo.parse(["New York is a great city."])
    
    print("✅ Geoparser working correctly!")
    print(f"Test result: Found {len(test_docs[0].toponyms)} location(s)")
    
except Exception as e:
    print(f"❌ Geoparser error: {e}")
    print("There may be an issue with the model downloads.")

## Step 4: Check Data Files

Verify that the required data files from previous lessons are available.

In [None]:
import os

# Check for required data files
required_files = [
    'df_virginia_toponym_sentiment_full.pickle',
    'df_virginia_geoparsed_complete.pickle',
    'df_geolocations_sentiments.pickle',
    'df_geolocations_sentiments_small.pickle'
]

print("Checking for required data files:")
missing_files = []

for file in required_files:
    if os.path.exists(file):
        print(f"✅ {file} - Found")
    else:
        print(f"❌ {file} - Missing")
        missing_files.append(file)

if missing_files:
    print(f"\n⚠️ Warning: {len(missing_files)} data file(s) missing.")
    print("Some parts of the lesson will require running previous lessons first.")
else:
    print("\n✅ All data files found!")

## Step 5: Installation Complete!

🎉 **Congratulations!** Your environment is now set up for the geoparsing lesson.

### What we installed:
- **geoparser**: For extracting and resolving geographic locations from text
- **spaCy + en_core_web_trf**: Advanced language processing model
- **plotly**: Interactive mapping and visualization
- **mapclassify**: For intelligent data bucketing
- **pandas, tqdm**: Data manipulation and progress bars

### Next Steps:
1. Close this notebook
2. Open `lesson_5_geoparsing_mapping.ipynb`
3. Follow the streamlined lesson!

### Troubleshooting:
If you encounter any issues:
1. Restart your kernel (Kernel → Restart)
2. Re-run this notebook from the beginning
3. Make sure you have sufficient disk space (models require ~2GB)
4. Check your internet connection for model downloads