In [64]:
bio_text = """ 
Mangesh Sambare is an Indian Junior Data Scientist, AI Trainer, Technical Mentor, and Startup Professional with strong hands-on experience in Machine Learning, Deep Learning, Natural Language Processing, Data Analytics, and Generative AI. He is professionally associated with Affordable AI Technology, a data science and AI-focused organization that provides live project-based training, internships, and real-world industry exposure to students and freshers. He actively contributes to both technical development and training delivery within the organization.

His primary technical expertise lies in Python and SQL for data analysis, preprocessing, automation, and backend logic. He works extensively with machine learning and deep learning frameworks such as TensorFlow and scikit-learn, and has experience building models using CNNs, LSTMs, RNNs, and traditional supervised and unsupervised learning algorithms. His workflow typically includes data collection, cleaning, feature engineering, model training, evaluation, optimization, and deployment.

At Affordable AI Technology, Mangesh Sambare works as a Junior Data Scientist and Project Mentor, where he leads and supports multiple client-based and internal AI projects. His work includes NLP-based systems, time series forecasting models, AI chatbots, data dashboards, and automation pipelines. He plays a key role in mentoring interns and trainees by guiding them through Python programming, SQL querying, machine learning concepts, deployment practices, and real-time project execution. He is directly involved in delivering internship programs that emphasize hands-on learning with live datasets and industry-style problem statements. The organization operates with a strong focus on practical skills, real client exposure, and career-oriented training, and its branch operations have expanded to locations such as Buttibori.

Before this role, he worked as a Data Analytics Intern at Softronix Software Services Pvt. Ltd., where he gained industry experience in data extraction, automation, and reporting. During this internship, he automated large-scale e-commerce data scraping using Selenium, BeautifulSoup, and ChromeDriver, significantly reducing manual data collection effort. He also designed a Flask-based data pipeline to handle automated data ingestion, preprocessing, and analysis. In addition, he developed Excel-based dashboards and reports to track business KPIs and product performance.

Mangesh Sambare has completed several end-to-end machine learning and deep learning projects. He developed a fashion item classification system using Convolutional Neural Networks trained on the Fashion MNIST dataset, achieving approximately ninety-two percent accuracy through proper normalization, dropout regularization, and activation tuning. He built an LSTM-based sentiment analysis system for e-commerce reviews using thousands of labeled samples and achieved over ninety-two percent accuracy. This system was integrated into a Flask web application for real-time sentiment prediction. He also completed a Flipkart product feedback analysis project involving the scraping of more than five thousand product reviews and ratings, followed by sentiment analysis and insight generation. Additionally, he developed automated image dataset acquisition tools using web scraping to collect high-quality image datasets from dynamically loaded web pages.

His technical skill set spans Python, SQL, Pandas, NumPy, Machine Learning, Deep Learning, NLP, Computer Vision, Time Series Forecasting, Feature Engineering, Model Evaluation, Hyperparameter Tuning, ETL pipelines, Data Warehousing concepts, RESTful APIs, Flask-based backend systems, Power BI dashboards, Excel analytics, web scraping, automation, Git, GitHub, and Agile-style project workflows. He is comfortable working on both research-oriented prototypes and production-style applications.

Apart from development work, Mangesh Sambare is deeply involved in teaching, mentoring, and content creation. He conducts structured training programs in Python, Data Science, Machine Learning, NLP, Deep Learning, Generative AI, Power BI, and Advanced Excel. His teaching approach emphasizes real-world examples, interview-oriented problem solving, and hands-on live projects. He regularly designs short-term courses such as Python Basics programs, Machine Learning bootcamps, and Data Science internship tracks that include training, project work, and career guidance.

He runs a technical YouTube channel named “LearnCode_Mangesh” (also known as “LearnCode Infinity by Mangesh Sambare”), where he publishes educational videos on Python programming, data science fundamentals, machine learning algorithms, SQL roadmaps, interview preparation, coding logic, and real-time project demonstrations. His content targets beginners, students, interns, and early-career professionals aiming to enter the data science and AI field.

Mangesh Sambare maintains an active GitHub profile where he hosts machine learning, NLP, generative AI, RAG-based systems, web scraping projects, and deployment-ready applications. His repositories often include complete pipelines, documentation, and practical implementations. On LinkedIn, he shares professional updates, technical milestones, training initiatives, and project-related insights, engaging with the broader data science and AI community.

He holds a Master of Science degree in Electronics from RTMNU Campus, Maharashtra, completed between 2020 and 2022, and a Bachelor of Science degree in Physics, Mathematics, and Electronics from Kamla Nehru College, Nagpur, completed between 2016 and 2020. He has completed a Data Science Bootcamp from iNeuron and holds a Power BI certification from Satish Dhawale.

Overall, Mangesh Sambare’s professional profile represents a combination of applied data science, AI engineering, technical mentoring, startup involvement, and educational content creation. His work focuses on building practical, scalable, and industry-relevant AI solutions while simultaneously training and guiding students and interns to become job-ready data professionals.
"""

In [65]:
import requests
import os
import json
import numpy as np

In [66]:
def chunk_text(text,chunk_size = 400,overlap=40):
    words = text.split()
    chunks=  []
    start = 0
    while start < len(words):
        end = min(start + chunk_size,len(words))
        chunk = ' '.join(words[start:end])
        chunks.append(chunk)
        start +=chunk_size - overlap
    return chunks

In [67]:
chunks = chunk_text(bio_text)

In [68]:
chunks

['Mangesh Sambare is an Indian Junior Data Scientist, AI Trainer, Technical Mentor, and Startup Professional with strong hands-on experience in Machine Learning, Deep Learning, Natural Language Processing, Data Analytics, and Generative AI. He is professionally associated with Affordable AI Technology, a data science and AI-focused organization that provides live project-based training, internships, and real-world industry exposure to students and freshers. He actively contributes to both technical development and training delivery within the organization. His primary technical expertise lies in Python and SQL for data analysis, preprocessing, automation, and backend logic. He works extensively with machine learning and deep learning frameworks such as TensorFlow and scikit-learn, and has experience building models using CNNs, LSTMs, RNNs, and traditional supervised and unsupervised learning algorithms. His workflow typically includes data collection, cleaning, feature engineering, mod

In [69]:
len(bio_text)

6143

In [70]:
API_URI = "https://api.euron.one/api/v1/euri/embeddings"
API_KEY = "euri-6c93f5f50ee246146f624f32ba61c7fdaeed00fd4d6f7f8b8c7730b9dad3d231"
MODEL_NAME = "text-embedding-3-small"

In [71]:
header = {
    "Content-Type":"application/json",
    "Authorization":f"Bearer {API_KEY}"
}

In [83]:
all_embedding = []
for i,chunk in enumerate(chunks):
    payload={
        "model":MODEL_NAME,
        "input":chunk
    }
    responce = requests.post(API_URI,headers=header,data = json.dumps(payload))
    result  = responce.json()
    embedding = result["data"][0]["embedding"]
    all_embedding.append(embedding)

In [84]:
result

{'object': 'list',
 'data': [{'object': 'embedding',
   'embedding': [0.023845285,
    -0.018081943,
    0.020523664,
    0.028266784,
    0.020083714,
    -0.026638968,
    0.020886622,
    0.04260915,
    0.0036955795,
    -0.024835173,
    0.033502188,
    0.0027991815,
    -0.07492347,
    -0.016058173,
    0.020171704,
    0.089309834,
    -0.008645016,
    0.007616633,
    -0.0061043054,
    0.018037947,
    0.033348203,
    0.012846538,
    0.064628646,
    -0.0100583555,
    0.030862488,
    -0.0036680824,
    0.032006357,
    0.024527209,
    0.014947299,
    0.01994073,
    0.02265742,
    -0.013253491,
    -0.013781431,
    0.0015164524,
    0.006621246,
    0.016564114,
    -0.016564114,
    0.008155572,
    -0.009788886,
    -0.009079467,
    -0.0057963403,
    -0.055037737,
    -0.0005819025,
    0.08631817,
    -0.00054512545,
    -0.018862853,
    -0.04252116,
    -0.02479118,
    0.020688646,
    0.03875959,
    -0.011218723,
    -0.0289927,
    0.016971068,
    0.0155

In [85]:
for i in enumerate(chunks):
    print(i)

(0, 'Mangesh Sambare is an Indian Junior Data Scientist, AI Trainer, Technical Mentor, and Startup Professional with strong hands-on experience in Machine Learning, Deep Learning, Natural Language Processing, Data Analytics, and Generative AI. He is professionally associated with Affordable AI Technology, a data science and AI-focused organization that provides live project-based training, internships, and real-world industry exposure to students and freshers. He actively contributes to both technical development and training delivery within the organization. His primary technical expertise lies in Python and SQL for data analysis, preprocessing, automation, and backend logic. He works extensively with machine learning and deep learning frameworks such as TensorFlow and scikit-learn, and has experience building models using CNNs, LSTMs, RNNs, and traditional supervised and unsupervised learning algorithms. His workflow typically includes data collection, cleaning, feature engineering, 

In [86]:
result["data"][0]["embedding"]

[0.023845285,
 -0.018081943,
 0.020523664,
 0.028266784,
 0.020083714,
 -0.026638968,
 0.020886622,
 0.04260915,
 0.0036955795,
 -0.024835173,
 0.033502188,
 0.0027991815,
 -0.07492347,
 -0.016058173,
 0.020171704,
 0.089309834,
 -0.008645016,
 0.007616633,
 -0.0061043054,
 0.018037947,
 0.033348203,
 0.012846538,
 0.064628646,
 -0.0100583555,
 0.030862488,
 -0.0036680824,
 0.032006357,
 0.024527209,
 0.014947299,
 0.01994073,
 0.02265742,
 -0.013253491,
 -0.013781431,
 0.0015164524,
 0.006621246,
 0.016564114,
 -0.016564114,
 0.008155572,
 -0.009788886,
 -0.009079467,
 -0.0057963403,
 -0.055037737,
 -0.0005819025,
 0.08631817,
 -0.00054512545,
 -0.018862853,
 -0.04252116,
 -0.02479118,
 0.020688646,
 0.03875959,
 -0.011218723,
 -0.0289927,
 0.016971068,
 0.015585226,
 -0.022426447,
 0.009629404,
 0.02152455,
 0.0058898297,
 -0.008331552,
 -0.03306224,
 0.017861968,
 -0.0108007705,
 0.0014435857,
 0.037857693,
 0.019753752,
 0.04584278,
 -0.029850602,
 0.015475239,
 0.0022643672,
 -0.0

In [87]:
len(result["data"][0]["embedding"])

1536

In [88]:
len(all_embedding)

3

In [89]:
type(all_embedding)

list

In [90]:
embedding_array = np.array(all_embedding,dtype= "float32")

In [91]:
embedding_array

array([[ 0.0308245 , -0.02561136,  0.01309264, ...,  0.01644053,
         0.03089624, -0.01211219],
       [ 0.02976203, -0.02132408, -0.00225043, ..., -0.00446059,
         0.01490579, -0.01374109],
       [ 0.02384529, -0.01808194,  0.02052366, ..., -0.0048422 ,
         0.02674896,  0.00363784]], shape=(3, 1536), dtype=float32)

In [101]:
import faiss
base_index = faiss.IndexFlatL2(1536)   # L2 = Neuclidian distancce

In [97]:
pip install faiss-cpu --force-reinstall


Collecting faiss-cpu
  Using cached faiss_cpu-1.13.2-cp311-cp311-win_amd64.whl.metadata (7.6 kB)
Collecting numpy<3.0,>=1.25.0 (from faiss-cpu)
  Using cached numpy-2.4.0-cp311-cp311-win_amd64.whl.metadata (6.6 kB)
Collecting packaging (from faiss-cpu)
  Using cached packaging-25.0-py3-none-any.whl.metadata (3.3 kB)
Using cached faiss_cpu-1.13.2-cp311-cp311-win_amd64.whl (18.9 MB)
Using cached numpy-2.4.0-cp311-cp311-win_amd64.whl (12.6 MB)
Using cached packaging-25.0-py3-none-any.whl (66 kB)
Installing collected packages: packaging, numpy, faiss-cpu

  Attempting uninstall: packaging

    Found existing installation: packaging 25.0

    Uninstalling packaging-25.0:

      Successfully uninstalled packaging-25.0

  Attempting uninstall: numpy

    Found existing installation: numpy 2.4.0

   ------------- -------------------------- 1/3 [numpy]
   ------------- -------------------------- 1/3 [numpy]
   ------------- -------------------------- 1/3 [numpy]
    Uninstalling numpy-2.4.0:
  

  You can safely remove it manually.
  You can safely remove it manually.


In [106]:
base_index.add(embedding_array)

In [107]:
faiss.write_index(base_index,"faiss_index.faiss")

### Ask a query to faiss 

In [121]:
query_test  = "tell me about the Mangesh Sambare college year"

In [122]:
def embedding_text(text):
    payload={
        "model":MODEL_NAME,
        "input":text
    }
    responce = requests.post(API_URI,headers=header,data = json.dumps(payload))
    result  = responce.json()
    embedding = result["data"][0]["embedding"]
    emb = np.array(embedding,dtype="float32").reshape(1,-1)
    return emb 

In [123]:
query_test_embeding = embedding_text(query_test)

In [124]:
query_test_embeding

array([[ 0.04013788, -0.03656097,  0.0157548 , ..., -0.00894911,
         0.01395952, -0.00674084]], shape=(1, 1536), dtype=float32)

In [125]:
base_index.search(query_test_embeding,3)

(array([[0.91568  , 1.0437999, 1.0660416]], dtype=float32), array([[2, 0, 1]]))

In [126]:
chunks[2]

'a Bachelor of Science degree in Physics, Mathematics, and Electronics from Kamla Nehru College, Nagpur, completed between 2016 and 2020. He has completed a Data Science Bootcamp from iNeuron and holds a Power BI certification from Satish Dhawale. Overall, Mangesh Sambare’s professional profile represents a combination of applied data science, AI engineering, technical mentoring, startup involvement, and educational content creation. His work focuses on building practical, scalable, and industry-relevant AI solutions while simultaneously training and guiding students and interns to become job-ready data professionals.'

In [118]:
chunks[2]

'a Bachelor of Science degree in Physics, Mathematics, and Electronics from Kamla Nehru College, Nagpur, completed between 2016 and 2020. He has completed a Data Science Bootcamp from iNeuron and holds a Power BI certification from Satish Dhawale. Overall, Mangesh Sambare’s professional profile represents a combination of applied data science, AI engineering, technical mentoring, startup involvement, and educational content creation. His work focuses on building practical, scalable, and industry-relevant AI solutions while simultaneously training and guiding students and interns to become job-ready data professionals.'