NLP Litgation scoring Pipeline

A robust NLP pipeline for document processing and semantic analysis with TF-IDF and Word2Vec embeddings.

Features

Text Processing

✔️ Automated text cleaning pipeline
✔️ Customizable stopword filtering
✔️ Punctuation and special character removal

Feature Extraction

🎯 TF-IDF vectorization with scikit-learn
🎯 Word2Vec embedding training
🎛️ Configurable hyperparameters via global_options.py

Semantic Analysis

🔍 Seed word similarity scoring
📊 Document-level semantic profiling
💾 Results export to CSV

Pipeline Architecture

text_processing_pipeline/
│
├── data/
│   ├── input/                  # Raw documents (.txt)
│   ├── processed/              # Cleaned text and intermediate files
│   └── dictionaries/           # Seed words and stopwords
│
├── models/                     # Serialized Word2Vec models
│   └── word_vectors.kv         # Pretrained embeddings
│
├── outputs/
│   ├── word_similarities/      # Per-seed-word similarity scores
│   └── df_listscore.csv     # Final aggregated scores
│
├── config/
│   └── global_options.py       # Path configurations
│
└── scripts/                    # Processing modules
    ├── NER_pipeline.py
    ├── preprocessing.py
    ├── ML.py
    ├── feature_engineering.py
    └── litigation_score_final.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Litgation scoring Pipeline

Table of Contents

Features

Text Processing

Feature Extraction

Semantic Analysis

Pipeline Architecture

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
ML.py		ML.py
ML_pretrained_model.py		ML_pretrained_model.py
NER_pipeline.py		NER_pipeline.py
README.md		README.md
environment.yml		environment.yml
feature_engineering.py		feature_engineering.py
global_options.py		global_options.py
litigation_score_final.py		litigation_score_final.py
preprocessing.py		preprocessing.py

quinnieMA/Litigation-score

Folders and files

Latest commit

History

Repository files navigation

NLP Litgation scoring Pipeline

Table of Contents

Features

Text Processing

Feature Extraction

Semantic Analysis

Pipeline Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages