Skip to content
View ryndovaira's full-sized avatar
👀
👀

Block or report ryndovaira

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ryndovaira/README.md

Machine Learning Engineer | Data Scientist | Python Developer

Table of Contents

  1. Personal Information
  2. Summary
  3. Languages
  4. References
  5. Skills
  6. Professional Experience
  7. Education
  8. Certifications
  9. Publications

Personal Information


Summary

Machine Learning Engineer with over 9 years of experience, combining a strong background in software engineering with expertise in AI-driven solutions. Skilled in all stages of machine learning workflows, including data preprocessing, model development, and deployment. Recent projects include retrieval-augmented generation (RAG) systems and applications in healthcare. Equally comfortable working independently or as part of a team, with a collaborative and methodical approach to problem-solving. AWS-certified and open to expanding expertise into new areas.


Languages

  • English: B2+ (Upper Intermediate)
  • Russian: Native

References

A PDF of the recommendation letter is available via Google Drive. Further details, including referee contact information, can be provided upon request as needed.


Skills

Core Skills

  • Programming Languages: Python, SQL
  • Databases: Relational (MySQL, SQLite), NoSQL (Mongo), Vector (Pinecone, LlamaIndex, Chroma)
  • Libraries & Frameworks:
    • Core ML Libraries: NumPy, Pandas, Scikit-learn, PyTorch, Keras
    • NLP & Specialized Tools: Hugging Face, LangChain, FAISS
    • Visualization: Matplotlib, Seaborn, Plotly
    • Others: FastAPI, MMDetection, lm-evaluation-harness, Supervisely
  • Models & APIs: OpenAI API (ChatGPT), LLaMA 2/3, Gemini
  • Machine Learning Techniques:
    • Retrieval-Augmented Generation (RAG)
    • Traditional / Deep Machine Learning
    • Natural Language Processing (NLP) / Natural Language Understanding (NLU)
    • Exploratory Data Analysis (EDA)
    • Data Processing & Analysis
    • Model Tuning and Evaluation
    • Data Visualization
  • Operating Systems: Linux, Windows
  • Development Tools: Docker, Docker Compose, Git, GitHub, GitLab, JupyterLab, Supervisely
  • Infrastructure & Platforms: AWS, Azure, Google Cloud, IBM Cloud Pak for Data, Cerebras
  • Experiment Tracking & Automation: MLflow, CI/CD Pipelines (GitHub Actions), Automated Testing
  • Soft Skills:
    • Approachable and supportive colleague
    • Collaborative and methodical in problem-solving
    • Adaptable to diverse roles and team dynamics
  • Domain Knowledge: Electronic Healthcare Records (EHR)

Additional Skills

  • Programming Languages: C++, Java, R, SPARQL (AnzoGraph DB)
  • Libraries & Frameworks: Cython, Qt, pyTelegramBotAPI
  • Development Tools: Google Test, Mercurial, TeamCity

Professional Experience

Machine Learning Engineer

Company Name: Quantori LLC
Location: Remote
Dates of Employment: August 2023 – November 2024

Project: Development of a Chatbot System for Oncology Treatment Support

Technologies Used:

  • Programming Languages: Python
  • Libraries & Frameworks: Pandas, NumPy, Scikit-learn, Hugging Face Transformers, FAISS, Pinecone, LangChain, lm-evaluation-harness
  • Machine Learning Models: OpenAI API (ChatGPT-4), LLaMA 2/3, RAG pipeline
  • Development Tools: JupyterLab, Pytest, FastAPI, Git, Docker (Docker Compose), Streamlit
  • Infrastructure & Platforms: Linux, IBM Cloud Pak for Data, Cerebras

Responsibilities:

  • Processed and structured raw datasets, including clinical records, genomic data, pathology reports, and laboratory results. Collaborated with domain experts to clarify medical terminology and resolve data inconsistencies.
  • Performed EDA and created visualizations to analyze trends and improve data quality.
  • Designed and implemented an RAG pipeline integrating public (OpenAI API) and private (LLaMA) data sources to provide evidence-based treatment recommendations.
  • Fine-tuned LLaMA models using Cerebras infrastructure for private HIPAA-compliant data and OpenAI API models for public data, ensuring tone, relevance, and clinical alignment.
  • Deployed the chatbot on-premise using FastAPI and developed a basic interface with Streamlit for user interaction and testing.
  • Stored vector embeddings with FAISS during prototyping and transitioned to Pinecone for scalable deployment.
  • Configured CI/CD pipelines with GitHub Actions to automate deployment workflows.
  • Conducted basic testing of chatbot responses, including tone, relevance, and functionality, refining outputs based on feedback.
  • Translated high-level research ideas into actionable technical tasks. Prepared regular progress reports to track milestones and communicate findings effectively.
  • Onboarded new team members and supported them during their initial project phases.

Machine Learning Engineer

Company Name: Quantori LLC
Location: Remote
Dates of Employment: September 2022 – August 2023

Project: Classifying and Scoring Edema (condition caused by excess fluid in the lungs) on Chest X-Ray Images

Technologies Used:

  • Programming Languages: Python
  • Libraries & Frameworks: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, MMDetection, Supervisely
  • Experiment Tracking: MLflow
  • Development Tools: JupyterLab, Pytest, Git, GitHub
  • Infrastructure & Platforms: AWS (S3, EC2), Linux

Responsibilities:

  • Processed labeled datasets provided by domain experts, converting them into COCO format to enable compatibility with MMDetection workflows.
  • Collaborated on configuring MLflow for shared experiment tracking and performance monitoring, supporting reproducibility across the team.
  • Assisted in conducting experiments with MMDetection, helping to evaluate and refine models for classifying and scoring edema features.
  • Designed visualizations to present results and analyze model outputs, providing insights into model behavior and debugging processes.
  • Provided software development support for the project, contributing to the research team’s goal of publishing a peer-reviewed study in Radiology Advances.

Data Scientist

Company Name: Quantori LLC
Location: Remote
Dates of Employment: September 2021 – August 2022

Project: Discovering Genetic Patterns in Autoimmune Disease Patients

Technologies Used:

  • Programming Languages: Python, SQL
  • Libraries & Frameworks: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, HDBSCAN, SHAP
  • Development Tools: JupyterLab, Pytest, Git, GitHub
  • Infrastructure & Platforms: AWS (Aurora, S3, SageMaker), Linux

Responsibilities:

  • Collaborated with researchers to analyze datasets from UK Biobank, supporting their efforts to uncover trends and generate insights into autoimmune diseases.
  • Conducted extensive exploratory data analysis (EDA) to identify potential patterns and validate hypotheses generated by researchers.
  • Performed patient segmentation through clustering techniques to group individuals with shared biological characteristics, aiding researchers in exploring sub-populations of interest.
  • Applied SHAP to investigate feature importance during clustering, enhancing the interpretability and reliability of segmentation results.
  • Retrieved and analyzed longitudinal electronic healthcare records (EHR) from AWS Aurora databases to provide researchers with detailed data summaries.
  • Automated experimental pipelines and implemented testing processes to ensure reliable workflows and consistent data quality.
  • Studied autoimmune diseases, traditional treatments, and ICD-10/9 classifications to gain the necessary domain knowledge for clustering and data analysis tasks.
  • Prepared detailed reports summarizing EDA and clustering results, enabling researchers to identify potential genetic patterns and areas for further study.
  • Reviewed the work of junior engineers, provided guidance, and resolved technical challenges to maintain project momentum.

Data Engineer

Company Name: Quantori LLC
Location: Remote
Dates of Employment: July 2021 – September 2021

Project: Organizing Bioinformatics Data and Project Resources

Technologies Used:

  • Programming Languages: R
  • Infrastructure & Platforms: AWS (S3, EC2), Linux

Responsibilities:

  • Collaborated with the lead data engineer to structure raw bioinformatics data provided in diverse file formats, ensuring accessibility and usability for analysis.
  • Organized and consolidated the client’s project resources, restructuring R scripts and related materials into a clear and centralized repository to enable effective collaboration.
  • Reviewed and refined existing R scripts to clean and process data, ensuring compatibility with the project’s requirements.
  • Worked with the client to retrieve missing files and resolve inconsistencies in the provided data and resources.

Python/Data Science Tutor

Company Name: LevelUp
Location: Remote (Russia)
Dates of Employment: May 2020 – July 2021

Technologies Used:

  • Programming Languages: Python, SQL
  • Libraries & Frameworks: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, Keras, XGBoost, LightGBM, Imblearn
  • Development Tools: JupyterLab, Pytest, Git, GitHub

Responsibilities:

  • Designed and developed comprehensive teaching materials, including lecture slides, coding exercises, and projects tailored for beginner-level students in small groups (10-20 participants).
  • Delivered interactive lessons on Python fundamentals, object-oriented programming, and core data science workflows.
  • Mentored students in Python scripting, data preprocessing, and feature engineering techniques, ensuring a strong foundation in practical programming skills.
  • Evaluated homework assignments and projects to track student progress, providing constructive feedback to support their learning.
  • Taught machine learning concepts, including regression models, decision trees, and ensemble techniques, with real-world examples.
  • Introduced students to hands-on data manipulation, exploratory data analysis (EDA), and creating end-to-end pipelines using Python libraries.
  • Guided students in using JupyterLab for effective coding workflows and debugging tools like Pytest to ensure reproducibility.

Freelance Data Scientist

Location: Remote
Dates of Employment: October 2020 – June 2021

Summary:
Worked independently on diverse data science projects, focusing on developing machine learning models, performing data analysis, and extracting insights from complex datasets.

Technologies Used:

  • Programming Languages: Python
  • Libraries & Frameworks: Pandas, Scikit-learn, NumPy, Matplotlib, Seaborn, XGBoost, LightGBM, Keras, BERT
  • Development Tools: JupyterLab, Git, GitHub
  • Version Control: Git

Contributions:

  • Payment Behavior Analysis: Built machine learning models, such as decision trees, logistic regression, and gradient boosting, to analyze payment behavior and classify users into distinct customer groups.
  • Customer Satisfaction Improvement: Conducted sentiment analysis on Google Play reviews using NLP techniques, including tokenization and text vectorization, to identify key improvement areas and enhance user satisfaction.
  • Click Prediction Model: Developed a predictive model using algorithms like random forests and logistic regression to determine the likelihood of users clicking on ads in a web browser.
  • Data Exploration and Visualization: Conducted exploratory data analysis (EDA) and created visualizations using Matplotlib and Seaborn to effectively communicate insights.
  • NLP and Text Analytics: Applied NLP techniques, including BERT-based text classification, to extract insights from unstructured text data.

Data Scientist (NLP & Text Analytics)

Company Name: MTS AI
Location: Saint-Petersburg, Russia (Remote) Dates of Employment: June 2020 – October 2020

Project: Sentiment Analysis and Data Clustering

Technologies Used:

  • Programming Languages: Python 3
  • Libraries & Frameworks: Pandas, Scikit-learn, XGBoost, LightGBM, Imblearn, NLTK, NumPy, Matplotlib, Seaborn, pyTelegramBotAPI
  • Development Tools: JupyterLab, Pytest, Git, GitLab
  • Operating Systems: Linux

Responsibilities:

  • Clustered and classified data from customer reviews and conversations with bots and agents for sentiment analysis.
  • Developed supplementary Python scripts and demo projects, including a Telegram Bot for data interaction.
  • Performed data preprocessing, feature engineering, and model training using machine learning algorithms.
  • Created unit tests using Pytest to ensure robustness and quality of data analysis pipelines.

Lead ASR Engineer

Company Name: MTS AI
Location: Saint-Petersburg, Russia (Remote)
Dates of Employment: March 2019 – October 2020

Project: Development of an Automatic Speech Recognition Application

Technologies Used:

  • Programming Languages: C++, Python 3, Cython
  • Libraries & Frameworks: Kaldi
  • Development Tools: Google Test, Pytest, Docker, Git, GitLab
  • Operating Systems: Linux

Responsibilities:

  • Created decoders using the Kaldi toolkit for automatic speech recognition (ASR).
  • Led a small team of engineers, including task assignment, code review, and mentoring.
  • Developed unit tests using Google Test and Pytest to ensure high-quality code.
  • Set up and maintained pipelines for build preparation and testing.
  • Developed Python function wrappers using Cython to integrate C++ functionality.

Software Engineer

Dates of Employment: November 2017 – March 2019
Company Name: Speech Technology Center (STC Group)
Location: Saint-Petersburg, Russia

Project: Development of a Large-Scale C++ Project

Technologies Used:

  • Programming Languages: C++, Python 3, Java
  • Development Tools: Google Test, Pytest, SWIG 3, Docker
  • Version Control & CI/CD: Git, GitLab, Mercurial, TeamCity
  • Operating Systems: Linux, Windows

Responsibilities:

  • Contributed to the development of a large-scale speech SDK project in C++ with over 10 years of active development.
  • Created supplementary scripts in Python 3 to enhance project functionality.
  • Took part in demo projects to showcase the product to clients.
  • Developed a Java function wrapper using SWIG 3 for the C++ project.
  • Managed build configurations in TeamCity for continuous integration.
  • Prepared unit tests using Google Test to ensure code quality.
  • Participated in pre-release integration testing with Java and Python.

Software Engineer

Dates of Employment: March 2015 – November 2017
Company Name: Russian Institute of Radio Navigation and Time
Location: St. Petersburg, Russia

Project: Software Development and Legacy Code Migration

Technologies Used:

  • Programming Languages: C, C++
  • Libraries & Frameworks: Qt

Responsibilities:

  • Supported and enhanced software solutions for navigation systems using C, C++, and Qt.
  • Assisted in migrating legacy code from Assembler to C, improving software stability and maintainability.
  • Worked on microcontroller programming for embedded system integration.
  • Developed an FTP client-server application for secure data transfer over internal networks, meeting specific technical requirements.

Education

Saint Petersburg State Electrotechnical University "LETI," Russia

  • Master's in Computer Engineering and Informatics
    • Sep 2014 - Jun 2016
  • Bachelor's in Computer Engineering and Informatics
    • Sep 2010 - Jun 2014

Certifications

Credly Profile: irina-ryndova

  • AWS Certified Machine Learning – Specialty
    Issued: Jan 2023 | Expires: Jan 2026

  • IBM Data Science Specialization (Coursera)

  • Applied Data Science Specialization (Coursera)

  • Introduction to Data Science Specialization (Coursera)

  • Stanford University - Machine Learning (Coursera)

  • Other Courses on Coursera:

    • Python for Data Science and AI
    • Data Analysis with Python
    • Data Visualization with Python
    • Databases and SQL for Data Science

Publications

"Explainable AI to Identify Radiographic Features of Pulmonary Edema"

  • Description: A study developing a deep learning method to identify radiographic features of pulmonary edema, a condition caused by excess fluid in the lungs.
  • Contribution: Software development, validation, and manuscript review and editing.
  • Published in: Radiology Advances, May 2024.
  • DOI: 10.1093/radadv/umae003

Pinned Loading

  1. quantori/edema-quantification quantori/edema-quantification Public

    This project is dedicated to the detection and classification of radiographic features associated with pulmonary edema

    Jupyter Notebook 3 2

  2. imdb-sentiment-classifier-keras-rnn imdb-sentiment-classifier-keras-rnn Public

    HTML 1