Skip to content

oussamanaji/NLP

Repository files navigation

🧠 Natural Language Processing (NLP) Portfolio

License: MIT Python 3.8+ GitHub issues

📚 Table of Contents

🌟 Overview

Welcome to my Natural Language Processing (NLP) portfolio! This repository showcases a diverse range of cutting-edge NLP projects, demonstrating expertise in both classical techniques and state-of-the-art deep learning models. From enterprise-focused causal AI to bias mitigation in language models, this collection represents the forefront of NLP innovation and practical application.

🚀 Projects

CausalNet: Enterprise-Focused Causal AI Framework

  • Description: A groundbreaking framework enhancing causal reasoning capabilities of large language models for enterprise applications.
  • Tech Stack: Python, PyTorch, Transformers, FastAPI
  • Models/Technologies: Cohere's aya-23-8B, HCCL, CIAT, MCR, ACRT, ECRT, MLCA, CKIS
  • View CausalNet Project

BiasGuard: Bias Mitigation in NLP using Multi-Agent Deep Reinforcement Learning

  • Description: An innovative project aimed at mitigating biases in AI-generated text.
  • Tech Stack: Python, PyTorch, Transformers, Ray
  • Models/Technologies: Cohere's aya-23-8B, PPO, LoRA, Quantization
  • View BiasGuard Project | Open in Colab

LDA-based Topic Modeling Engine for News Articles

  • Description: Uncovers hidden themes in large news article collections.
  • Tech Stack: Python, Gensim, NLTK, spaCy
  • Models/Technologies: LDA, TextRank, BART
  • View News Modeling LDA Project

Multi-Technique Text Classification System

  • Description: Demonstrates the power of ensemble methods in text classification.
  • Tech Stack: Python, Scikit-learn, Gensim, Keras
  • Models/Technologies: CountVectorizer, TfidfVectorizer, Word2Vec, Doc2Vec, MultinomialNB, LogisticRegression, SVM, DecisionTree
  • View Text Classification Project

Comprehensive Text Summarization Toolkit

  • Description: A versatile toolkit for both extractive and abstractive summarization.
  • Tech Stack: Python, PyTorch, Transformers, NLTK
  • Models/Technologies: TextRank, LexRank, BART, PEGASUS
  • View Text Summarization Project

CRF-based Prescription Parser

  • Description: Extracts structured information from unstructured medical prescriptions.
  • Tech Stack: Python, sklearn-crfsuite, spaCy
  • Models/Technologies: Conditional Random Fields (CRF)
  • View Prescription Parser Project

PDF to Audiobook Converter

  • Description: Transforms PDF documents into high-quality, natural-sounding audiobooks.
  • Tech Stack: Python, PyPDF2, gTTS
  • Models/Technologies: Google Text-to-Speech
  • View PDF to Audiobook Project

Sentiment Analysis of Financial News Headlines

🛠 Skills & Technologies

  • Languages: Python, R
  • NLP Libraries: NLTK, SpaCy, Gensim, Transformers
  • Machine Learning: Scikit-learn, TensorFlow, PyTorch
  • Deep Learning Models: BERT, RoBERTa, BART, PEGASUS, LLMs
  • Cloud & Deployment: Docker, AWS, Google Cloud Platform
  • Other: Git, FastAPI, Ray

💻 Installation & Usage

Each project has its own installation instructions and usage guide. Please refer to the individual project READMEs for detailed information.

General setup:

git clone https://github.com/oussamanaji/NLP.git
cd NLP
pip install -r requirements.txt

📊 Results & Achievements

  • CausalNet: Achieved 91% accuracy in causal reasoning tasks for enterprise scenarios.
  • BiasGuard: Reduced detected bias levels by 42% while improving perplexity by 30% and BLEU score by 38%.
  • Financial Sentiment Analysis: Attained 94% accuracy in predicting market trends based on news sentiment.
  • LDA Topic Modeling: Improved topic coherence by 25% compared to baseline models.
  • Text Classification System: Achieved 97% F1-score on multi-class classification tasks.
  • Text Summarization Toolkit: Reduced summary generation time by 40% while maintaining ROUGE-1 scores above 0.45.
  • Prescription Parser: Extracted medical entities with 98% accuracy, streamlining healthcare data processing.
  • PDF to Audiobook Converter: Processed over 10,000 pages with 99.9% text extraction accuracy and natural-sounding audio output.

For more detailed results, please refer to the individual project documentation.

🔮 Future Work

  • Integrate federated learning techniques for privacy-preserving NLP models
  • Explore cross-lingual transfer learning for improved multilingual support
  • Develop more robust evaluation metrics for bias and fairness in language models
  • Investigate the application of quantum computing in NLP tasks
  • Enhance CausalNet with real-time data processing capabilities for dynamic business environments
  • Implement adaptive learning algorithms in BiasGuard for continuous bias mitigation
  • Extend the Text Summarization Toolkit to support low-resource languages
  • Develop a multimodal NLP system integrating text, image, and audio data

🤝 Contributing

Contributions, issues, and feature requests are welcome! Feel free to check issues page if you want to contribute. Whether you're fixing bugs, improving documentation, or proposing new features, your input is valuable.

📞 Contact

Mohamed Oussama Naji

Feel free to reach out for collaborations, questions, or discussions about NLP and AI!

📄 License

This project is MIT licensed.


Thank you for exploring my NLP portfolio. I'm passionate about pushing the boundaries of what's possible with natural language processing and always open to new opportunities and collaborations. Let's connect and create the future of AI together!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages