- Overview
- Projects
- Skills & Technologies
- Installation & Usage
- Results & Achievements
- Future Work
- Contributing
- Contact
- License
Welcome to my Natural Language Processing (NLP) portfolio! This repository showcases a diverse range of cutting-edge NLP projects, demonstrating expertise in both classical techniques and state-of-the-art deep learning models. From enterprise-focused causal AI to bias mitigation in language models, this collection represents the forefront of NLP innovation and practical application.
- Description: A groundbreaking framework enhancing causal reasoning capabilities of large language models for enterprise applications.
- Tech Stack: Python, PyTorch, Transformers, FastAPI
- Models/Technologies: Cohere's
aya-23-8B
, HCCL, CIAT, MCR, ACRT, ECRT, MLCA, CKIS - View CausalNet Project
- Description: An innovative project aimed at mitigating biases in AI-generated text.
- Tech Stack: Python, PyTorch, Transformers, Ray
- Models/Technologies: Cohere's
aya-23-8B
, PPO, LoRA, Quantization - View BiasGuard Project | Open in Colab
- Description: Uncovers hidden themes in large news article collections.
- Tech Stack: Python, Gensim, NLTK, spaCy
- Models/Technologies: LDA, TextRank, BART
- View News Modeling LDA Project
- Description: Demonstrates the power of ensemble methods in text classification.
- Tech Stack: Python, Scikit-learn, Gensim, Keras
- Models/Technologies: CountVectorizer, TfidfVectorizer, Word2Vec, Doc2Vec, MultinomialNB, LogisticRegression, SVM, DecisionTree
- View Text Classification Project
- Description: A versatile toolkit for both extractive and abstractive summarization.
- Tech Stack: Python, PyTorch, Transformers, NLTK
- Models/Technologies: TextRank, LexRank, BART, PEGASUS
- View Text Summarization Project
- Description: Extracts structured information from unstructured medical prescriptions.
- Tech Stack: Python, sklearn-crfsuite, spaCy
- Models/Technologies: Conditional Random Fields (CRF)
- View Prescription Parser Project
- Description: Transforms PDF documents into high-quality, natural-sounding audiobooks.
- Tech Stack: Python, PyPDF2, gTTS
- Models/Technologies: Google Text-to-Speech
- View PDF to Audiobook Project
- Description: Applies advanced sentiment analysis techniques to financial news headlines.
- Tech Stack: Python, PyTorch, Transformers, Pandas
- Models/Technologies: BERT, RoBERTa
- View Financial News Sentiment Analysis Project
- Languages: Python, R
- NLP Libraries: NLTK, SpaCy, Gensim, Transformers
- Machine Learning: Scikit-learn, TensorFlow, PyTorch
- Deep Learning Models: BERT, RoBERTa, BART, PEGASUS, LLMs
- Cloud & Deployment: Docker, AWS, Google Cloud Platform
- Other: Git, FastAPI, Ray
Each project has its own installation instructions and usage guide. Please refer to the individual project READMEs for detailed information.
General setup:
git clone https://github.com/oussamanaji/NLP.git
cd NLP
pip install -r requirements.txt
- CausalNet: Achieved 91% accuracy in causal reasoning tasks for enterprise scenarios.
- BiasGuard: Reduced detected bias levels by 42% while improving perplexity by 30% and BLEU score by 38%.
- Financial Sentiment Analysis: Attained 94% accuracy in predicting market trends based on news sentiment.
- LDA Topic Modeling: Improved topic coherence by 25% compared to baseline models.
- Text Classification System: Achieved 97% F1-score on multi-class classification tasks.
- Text Summarization Toolkit: Reduced summary generation time by 40% while maintaining ROUGE-1 scores above 0.45.
- Prescription Parser: Extracted medical entities with 98% accuracy, streamlining healthcare data processing.
- PDF to Audiobook Converter: Processed over 10,000 pages with 99.9% text extraction accuracy and natural-sounding audio output.
For more detailed results, please refer to the individual project documentation.
- Integrate federated learning techniques for privacy-preserving NLP models
- Explore cross-lingual transfer learning for improved multilingual support
- Develop more robust evaluation metrics for bias and fairness in language models
- Investigate the application of quantum computing in NLP tasks
- Enhance CausalNet with real-time data processing capabilities for dynamic business environments
- Implement adaptive learning algorithms in BiasGuard for continuous bias mitigation
- Extend the Text Summarization Toolkit to support low-resource languages
- Develop a multimodal NLP system integrating text, image, and audio data
Contributions, issues, and feature requests are welcome! Feel free to check issues page if you want to contribute. Whether you're fixing bugs, improving documentation, or proposing new features, your input is valuable.
Mohamed Oussama Naji
- Email: mohamedoussama.naji@georgebrown.ca
- LinkedIn: Mohamed Oussama Naji
- GitHub: @oussamanaji
Feel free to reach out for collaborations, questions, or discussions about NLP and AI!
This project is MIT licensed.
Thank you for exploring my NLP portfolio. I'm passionate about pushing the boundaries of what's possible with natural language processing and always open to new opportunities and collaborations. Let's connect and create the future of AI together!