Text2Map is a comprehensive toolkit for extracting geospatial insights from social media text data. It combines Natural Language Processing (NER), geocoding, and interactive visualization to transform textual location mentions into dynamic spatial-temporal maps.
- Text Processing: Clean and preprocess social media text (remove RT, handles, emojis, links)
- Named Entity Recognition: Extract location entities (GPE, LOC, FAC) using fine-tuned BERT models
- Geocoding: Convert text locations to geographic coordinates using multiple geocoding services
- Visualization: Generate interactive heatmaps and time-series animations
- Temporal Analysis: Create cumulative and time-binned geospatial visualizations
- Animation Generation: Create GIF animations showing geospatial patterns over time
- Multi-scale Boundaries: Support for country, state, county, and city-level analysis
- Social Media Integration: Built-in Twitter/X API client for data collection
- Configurable Pipeline: YAML-based configuration for easy customization
- Python 3.8+
- CUDA-compatible GPU (recommended for BERT inference)
git clone https://github.com/yourusername/Text2Map.git
cd Text2Map
pip install -e .git clone https://github.com/yourusername/Text2Map.git
cd Text2Map
pip install -e ".[dev]"from text2map.core import TweetProcessor, GeocodeTweetProcessor
from text2map.models import BERTNERInference
# Process tweets
processor = TweetProcessor()
clean_tweets = processor.process_dataframe(tweets_df)
# Extract locations using BERT NER
ner = BERTNERInference()
locations = ner.process_dataframe(clean_tweets)
# Geocode locations
geocoder = GeocodeTweetProcessor()
geo_data = geocoder.geocode_data(locations)# Process Twitter data end-to-end
python -m text2map.core.text_processor --input tweets.csv --output clean_tweets.csv
# Extract locations using BERT NER
python -m text2map.models.bert_ner --input clean_tweets.csv --output locations.csv
# Geocode and generate maps
python -m text2map.core.geocoder --input locations.jsonl --output data/processed/from text2map.core import TweetProcessor, GeocodeTweetProcessor
from text2map.models import BERTNERInference
# 1. Process raw tweets
processor = TweetProcessor()
clean_tweets = processor.process_dataframe(hurricane_tweets)
# 2. Extract locations
ner = BERTNERInference(model_path="data/models/bert_ner")
locations = ner.process_dataframe(clean_tweets)
# 3. Geocode locations
geocoder = GeocodeTweetProcessor()
geo_data = geocoder.geocode_data(locations)- Track mention clusters during emergency events
- Analyze temporal evolution of affected areas
- Generate real-time situation awareness maps
-
Text Processing (
text2map.core.text_processor)- Social media text cleaning
- Noise removal (RT, handles, emojis, links)
- Text normalization
-
Named Entity Recognition (
text2map.models.bert_ner)- BERT-based location extraction
- Support for GPE, LOC, FAC entity types
- Confidence scoring and filtering
-
Geocoding (
text2map.core.geocoder)- Multiple geocoding service integration
- Batch processing capabilities
- Error handling and retry logic
-
Visualization (
text2map.visualization)- Interactive heatmap generation
- Time-series animation creation
- Multi-scale boundary overlays
Raw Text → Text Processing → NER → Geocoding → Visualization
↓ ↓ ↓ ↓ ↓
CSV/JSON Clean Text Entities Coordinates Maps/GIFs
Text2Map/
├── src/text2map/ # Main package
│ ├── core/ # Core processing modules
│ │ ├── text_processor.py # Tweet text cleaning
│ │ └── geocoder.py # Geocoding and mapping
│ ├── models/ # Machine learning models
│ │ └── bert_ner.py # BERT NER inference
│ ├── visualization/ # Mapping and visualization
│ └── utils/ # Utilities and helpers
├── data/ # Data storage
│ ├── boundaries/ # Geographic boundaries
│ │ ├── countries/ # Country-level boundaries
│ │ ├── counties/ # County-level boundaries
│ │ └── cities/ # City-level boundaries
│ ├── models/ # Pre-trained models
│ │ └── bert_ner/ # BERT NER model
│ └── processed/ # Output data
├── examples/ # Usage examples
├── tests/ # Test suite
├── docs/ # Documentation
└── config/ # Configuration files
- BERT Model:
data/models/bert_ner/ - Boundaries:
data/boundaries/ - Output:
data/processed/
# Custom model path
ner = BERTNERInference(model_path="path/to/custom/model")
# Custom boundary files
geocoder = GeocodeTweetProcessor(shapefile_path="path/to/states.shp")The toolkit uses several geographic boundary datasets:
- Natural Earth: Country and state boundaries (
data/boundaries/countries/) - US Census: County boundaries (
data/boundaries/counties/) - 500 Cities: City boundaries (
data/boundaries/cities/)
# Run all tests
pytest tests/
# Run specific test
pytest tests/test_text_processor.py
# Run with coverage
pytest --cov=text2map tests/We welcome contributions! Please see our Contributing Guidelines for details.
git clone https://github.com/yourusername/Text2Map.git
cd Text2Map
pip install -e ".[dev]"This project is licensed under the MIT License - see the LICENSE file for details.
- Natural Earth: Free vector and raster map data
- Hugging Face Transformers: BERT model implementation
- GeoPandas: Geospatial data processing
- Nominatim: Geocoding services
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Text2Map - Transform text into maps, reveal spatial stories