<a href="https://colab.research.google.com/github/polranirav/AI-Learning-Journey/blob/main/12%20Deep%20Learning/LSTM/Project/lstm_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
faqs = """About the Program
This program follows a monthly subscription model at Rs 799 per month.
The total duration of the DSMP is seven months.
The approximate overall fee across seven months is Rs 5600.
You can join anytime without waiting for a new batch to start.
All live sessions are recorded for later viewing.
Session recordings appear in your dashboard after class.
Typical live sessions run for about two hours.
The teaching language used in class is Hinglish.
Non-technical learners are welcome to enroll and start from basics.
Joining mid-way still grants access to past recordings during validity.

Syllabus Overview
The syllabus covers Python Fundamentals for complete beginners.
You will learn Python libraries used in Data Science work.
Data Analysis modules focus on EDA and storytelling with data.
SQL for Data Science includes joins and window functions.
Maths for Machine Learning keeps intuition first and formulas second.
ML Algorithms include supervised and unsupervised techniques.
Practical ML focuses on pipelines, validation, and feature engineering.
MLOps introduces experiment tracking, packaging, and simple CI/CD.
Case studies simulate realistic business problems and decisions.
Deep Learning and NLP are not part of this curriculum.

Links and Resources
You can check detailed syllabus on the official course page.
The monthly timetable is shared through a public Google Sheet.
Official payments are made only through the course website.
Reminder emails are sent before each paid session.
Dashboard access shows recordings, notes, and the doubt form.

Live Sessions
If you miss a session, you can watch the recording later.
Most sessions last roughly one hundred and twenty minutes.
Slides are primarily in English with Hinglish explanations.
Q&A time is reserved at the end of the session.
Weekly schedules are announced on the shared sheet.

Access and Validity
Your subscription is valid for thirty days from purchase time.
Renewals extend your access window by another thirty days.
During an active cycle, past paid content remains unlocked.
If your validity lapses, you will need to renew to continue.
Joining on any date shifts your next renewal to that date.

Refund Policy
A seven-day refund window starts from your payment date.
Refunds requested after seven days are not eligible.
Plan your first week to evaluate fit before the window closes.
Contact support if you need help initiating a refund request.
Refunds apply to the latest payment within policy limits.

Payments
Monthly payments must be made on the official website.
Do not pay through third-party links or private messages.
Receipts are emailed after successful payment processing.
International payment issues can be escalated by email.
Include your registered email and phone number when writing support.

Eligibility and Onboarding
Beginners from non-tech backgrounds can join confidently.
The course starts from Python basics and builds gradually.
You can jump in mid-month and begin with recordings.
Your dashboard unlocks immediately after successful payment.
Orientation notes help you navigate the platform quickly.

Doubt Support
You can fill a doubt form through the dashboard.
The team schedules one-on-one clarity calls for complex issues.
Past-week doubts can still be raised using the form option.
Provide examples or screenshots to speed up resolution.
Response times are communicated after form submission.

Certificate Criteria
You must complete full fee payment across seven months.
You must attempt all course assessments to qualify.
Certificates are issued to learners who meet both criteria.
Assessments emphasize applied understanding over rote math.
Keep submission notes concise and clearly structured.

Placement Assistance
Placement assistance does not imply a placement guarantee.
Job offers or interviews are not guaranteed by the program.
Assistance includes portfolio building and resume guidance.
Soft-skill sessions improve communication and interviews.
Mentor sessions add real-world perspective and feedback.
Job hunting strategies cover ATS keywords and outreach.
You should expect guidance, not assured outcomes.

Content Emphasis
Python Fundamentals cover syntax, control flow, and functions.
Libraries for DS include NumPy, Pandas, and Matplotlib.
Data Analysis focuses on tidy data and reproducible EDA.
SQL practice builds confidence with joins and aggregates.
Maths for ML develops intuition for vectors and gradients.
Algorithms include linear models, trees, and clustering.
Practical ML stresses pipelines and validation rigor.
MLOps introduces tracking, packaging, and deployments.
Case studies connect methods to business decisions.

Recordings and Dashboard
Recordings are your safety net for missed classes.
Videos appear in the dashboard within the validity period.
Downloadable resources are attached where permitted.
You can rewatch tricky segments at your own pace.
Keep personal notes aligned to each module outcome.

Scheduling
Typical classes run in the evening IST schedule.
Exact start times are posted on the weekly sheet.
Reminder emails arrive before each live session.
Calendar links may be provided for convenience.
Check the sheet regularly for any timing updates.

Policies and Safety
Always pay only through the official website link.
Never share OTPs or passwords with anyone.
Support will never ask for your confidential info.
Use the listed emails for payment-related queries.
Verify URLs before completing a transaction.

International Learners
If cards fail, contact support for alternatives.
Share transaction error screenshots for faster help.
Confirm time zone differences for live sessions.
Recordings help when time zones are challenging.
Support provides guidance tailored to your region.

Learning Approach
Focus on clarity and reproducibility over complexity.
Start with baseline models before heavy tuning.
Use proper validation to avoid leakage pitfalls.
Document assumptions at the top of each notebook.
Prefer readable code and clear variable names.

Evaluation Style
Assessments are short and focused on application.
Rubrics reward reasoning and decision justification.
Error analysis is valued alongside metric scores.
Write concise summaries of your modeling choices.
Link metrics to practical business costs where possible.

Practical Tips
Pin library versions to stabilize your environment.
Seed randomness to reproduce key results consistently.
Keep datasets versioned as experiments progress.
Use checklists to reduce last-minute mistakes.
Save artifacts with consistent naming conventions.

SQL Module Highlights
Practice joins across fact and dimension tables.
Use window functions for rankings and rolling stats.
Write groupby summaries at meaningful aggregation levels.
Consider indexes and query plans for performance.
Write clear SQL with consistent formatting.

Pandas and Visualization
Indexing and selection patterns improve readability.
Groupby pipelines summarize behavior effectively.
Avoid chained operations when clarity suffers.
Label axes and titles for meaningful charts.
Choose appropriate scales for honest visuals.

Maths Essentials
Vectors and matrices are introduced with intuition.
Gradients are linked to simple geometric ideas.
Bias-variance tradeoff is explained with examples.
Probability basics support reasoning under uncertainty.
You learn enough math to use models responsibly.

ML Algorithms
Begin with linear and logistic regression baselines.
Move to trees and ensembles for nonlinear structure.
Try clustering for unsupervised pattern discovery.
Use cross-validation to compare model families.
Tune hyperparameters only after strong baselines.

MLOps Basics
Track experiments with simple run identifiers.
Package code for predictable training and inference.
Capture environment details for reproducibility.
Automate small checks in a lightweight CI step.
Log decisions and metrics for future audits.

Case Studies
Start with a crisp problem statement and success metric.
Explore data visually to form testable hypotheses.
Engineer features grounded in domain intuition.
Validate with appropriate temporal splits when needed.
Present tradeoffs and a pragmatic recommendation.

Communication
Use plain language in reports and presentations.
Prefer few clear plots over many noisy ones.
Explain why a metric was chosen for the task.
State assumptions and limitations openly.
Outline next steps with realistic timelines.

Enrollment Flexibility
Mid-month joins are supported by rolling validity.
Renewals occur thirty days after your payment date.
Access continues until the current cycle ends.
Rejoining later restores your dashboard promptly.
You can learn at a pace that fits your schedule.

Support Channels
Email support for payment or access issues.
Include registered email and phone in messages.
Attach relevant screenshots for context.
Expect confirmation and estimated response windows.
Escalations are available for unresolved cases.

What’s Not Included
Deep Learning is outside the current scope.
NLP topics are not covered in this program.
Placement guarantees are not offered by the team.
Lifetime access is not provided due to low fees.
Advanced research topics are out of scope here.

After Course Access
After completing seven payments you keep access until the stated end date.
Final access windows are communicated near course completion.
Policies may update; refer to official pages for changes.
Timelines align with the published DSMP cohort information.
Use the dashboard to check your current validity dates.

Study Habits
Block time after each class to review notes.
Rewatch complex parts of recordings at higher speed.
Practice SQL and Pandas daily for fluency.
Summarize each module in your own words.
Share doubts early through the form.

Quality and Integrity
Cite data sources when using external datasets.
Avoid leaking information across data splits.
Prefer interpretable baselines before complex stacks.
Use calibration where thresholds matter to outcomes.
Keep feedback loops open with mentors and peers.

Community and Mentorship
Portfolio sessions cover project curation and impact framing.
Soft-skill exercises include STAR answers and mock interviews.
Mentor talks reveal real constraints from industry work.
Networking tips center on targeted outreach and follow-ups.
Job strategies emphasize ATS alignment to job descriptions.

Admin Reminders
All payments are processed on the official website.
Refunds are handled only within the seven-day window.
Schedules are updated in the shared Google Sheet.
Official updates are sent via registered email.
Keep your profile details accurate in the dashboard.

Contact
For payments and access, write to nitish.campusx@gmail.com.
Use clear subject lines describing your issue briefly.
Include your registered email and transaction details.
Do not share sensitive information in emails.
Expect a response with next steps and timelines.

Wrapping Up
The DSMP focuses on solid DS foundations and practical ML.
You will learn Python, SQL, Maths, and core ML methods.
MLOps adds tracking and simple deployment discipline.
Case studies tie methods to decisions stakeholders care about.
Recordings, rolling validity, and guidance keep learning flexible."""

In [2]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer

In [3]:
tokenizer = Tokenizer()

In [4]:
tokenizer.fit_on_texts([faqs])

In [5]:
len(tokenizer.word_index)

754

In [6]:
input_sequences = []
for sentence in faqs.split('\n'):
  tokenized_sentence = tokenizer.texts_to_sequences([sentence])[0]

  for i in range(1,len(tokenized_sentence)):
    input_sequences.append(tokenized_sentence[:i+1])

In [7]:
input_sequences

[[83, 2],
 [83, 2, 55],
 [84, 55],
 [84, 55, 268],
 [84, 55, 268, 9],
 [84, 55, 268, 9, 85],
 [84, 55, 268, 9, 85, 144],
 [84, 55, 268, 9, 85, 144, 145],
 [84, 55, 268, 9, 85, 144, 145, 24],
 [84, 55, 268, 9, 85, 144, 145, 24, 146],
 [84, 55, 268, 9, 85, 144, 145, 24, 146, 269],
 [84, 55, 268, 9, 85, 144, 145, 24, 146, 269, 270],
 [84, 55, 268, 9, 85, 144, 145, 24, 146, 269, 270, 86],
 [2, 271],
 [2, 271, 272],
 [2, 271, 272, 25],
 [2, 271, 272, 25, 2],
 [2, 271, 272, 25, 2, 87],
 [2, 271, 272, 25, 2, 87, 14],
 [2, 271, 272, 25, 2, 87, 14, 26],
 [2, 271, 272, 25, 2, 87, 14, 26, 88],
 [2, 273],
 [2, 273, 274],
 [2, 273, 274, 147],
 [2, 273, 274, 147, 56],
 [2, 273, 274, 147, 56, 26],
 [2, 273, 274, 147, 56, 26, 88],
 [2, 273, 274, 147, 56, 26, 88, 14],
 [2, 273, 274, 147, 56, 26, 88, 14, 146],
 [2, 273, 274, 147, 56, 26, 88, 14, 146, 275],
 [7, 15],
 [7, 15, 148],
 [7, 15, 148, 276],
 [7, 15, 148, 276, 277],
 [7, 15, 148, 276, 277, 278],
 [7, 15, 148, 276, 277, 278, 3],
 [7, 15, 148, 27

In [8]:
max_len = max([len(x) for x in input_sequences])

In [9]:
max_len

12

In [10]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
padded_input_sequences = pad_sequences(input_sequences, maxlen = max_len, padding='pre')

In [11]:
padded_input_sequences

array([[  0,   0,   0, ...,   0,  83,   2],
       [  0,   0,   0, ...,  83,   2,  55],
       [  0,   0,   0, ...,   0,  84,  55],
       ...,
       [  0,   0,   0, ...,   1,  79,  31],
       [  0,   0,   0, ...,  79,  31,  46],
       [  0,   0,   0, ...,  31,  46, 754]], dtype=int32)

In [12]:
X = padded_input_sequences[:,:-1]

In [13]:
y = padded_input_sequences[:,-1]

In [14]:
X.shape

(1406, 11)

In [15]:
y.shape

(1406,)

In [16]:
from tensorflow.keras.utils import to_categorical
y = to_categorical(y,num_classes=755)

In [17]:
y.shape

(1406, 755)

In [18]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

In [19]:
model = Sequential()
model.add(Embedding(755, 100, input_length=12))
model.add(LSTM(150))  # single LSTM is fine
model.add(Dense(755, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])



In [26]:
model.summary()

In [21]:
model.fit(X,y,epochs=100)

Epoch 1/100
[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 24ms/step - accuracy: 0.0299 - loss: 6.5761
Epoch 2/100
[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 23ms/step - accuracy: 0.0515 - loss: 6.0576
Epoch 3/100
[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 23ms/step - accuracy: 0.0453 - loss: 5.9737
Epoch 4/100
[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 23ms/step - accuracy: 0.0461 - loss: 5.9298
Epoch 5/100
[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 24ms/step - accuracy: 0.0514 - loss: 5.8326
Epoch 6/100
[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 23ms/step - accuracy: 0.0531 - loss: 5.6547
Epoch 7/100
[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 23ms/step - accuracy: 0.0506 - loss: 5.4902
Epoch 8/100
[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 28ms/step - accuracy: 0.0624 - loss: 5.3968
Epoch 9/100
[1m44/44[0m [32m━━━━━━━━━

<keras.src.callbacks.history.History at 0x7c5494706720>

In [22]:
import numpy as np

In [23]:
import time
text = "what is"

for i in range(20):
  # tokenize
  token_text = tokenizer.texts_to_sequences([text])[0]
  # padding
  padded_token_text = pad_sequences([token_text], maxlen=56, padding='pre')
  # predict
  pos = np.argmax(model.predict(padded_token_text))

  for word,index in tokenizer.word_index.items():
    if index == pos:
      text = text + " " + word
      print(text)
      time.sleep(2)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 216ms/step
what is align
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 43ms/step
what is align with
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 46ms/step
what is align with the
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step
what is align with the published
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 71ms/step
what is align with the published dsmp
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 45ms/step
what is align with the published dsmp one
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 48ms/step
what is align with the published dsmp one on
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51ms/step
what is align with the published dsmp one on one
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 44ms/step
what is align with the published dsmp one on one one
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━