$$\text{Anatomy of Language}$$
- Language 
    - Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary.
        - Early Languages
        - Modern Linguistics

- Language as a phenomenon
    - Language is considered as a social phenomenon because all human beings communicate with their respective speech communities using the language they speak.
        - Spoken Language
        - Written Language
- Semantics
    - Language exists to be meaningful; the study of meaning, both in general theoretical terms and in reference to a specific language, is known as semantics.
    - Semantics embraces the meaningful functions of phonological features, such as intonation, and of grammatical structures and the meanings of individual words.

- Language Variants
    - Language refers to both the universal human ability to communicate and its specific forms, like English, French, or Swahili.

- Physiological and physical basis
    - Language originated as a spoken system, evolving naturally with human communication, while writing emerged only 4,000–5,000 years ago as a way to represent speech. For most of human history, language was passed down orally.

- Speech Production
    - Speaking arises from exhaling air during respiration, modified by vocal tract movements to produce various sounds.

- Language Acquisition
    - All humans are physiologically alike in speech production. Language is learned from upbringing, not inherited; adopted children acquire their adoptive parents' language.


- Meaning and Style in Language
    - Language exists to convey meaning, shaped by the diverse needs of human communication, making the study of meaning highly complex.

- Structural, or grammatical, meaning
    - Sentence meaning combines word meanings and grammatical structure, as seen in sentences like The dog chased the cat and The boy chased the cat.

- Lexical meaning
    - Word meaning, or lexical meaning, refers to the individual definitions of words, as outlined in dictionaries. Answering "What does this word mean?" is often harder than it seems.

- Language and culture
    - Language is deeply tied to community life and culture, shaping and reflecting the details of daily living universally across all languages.

- Transmission of Language and culture
    - Language is primarily learned through cultural exposure, with limited direct teaching, as children construct grammar from the speech they hear. Language is an integral part of culture, influencing and reflecting societal membership.

- Symbolic Systems
    - Symbolic systems represent the world using meaningful symbols, including human and computer languages.
- Types of Languages 
    - Artificial Language 
        - Artificial languages arise from simulations, interactions, or experiments, evolving naturally rather than being consciously designed. They are used in cultural evolution studies and psycholinguistic research.
    - Constructed Language
        - Constructed languages, or conlangs, are intentionally designed for purposes like communication, fiction, experimentation, or art. Examples include international auxiliary languages and fictional languages.
    - Logical Language
        - Logical languages, such as Loglan and Lojban, use formal logic to eliminate ambiguity, aiming for precision in expression.
    - Programming Language
        - Programming languages, text-based or graphical, enable writing computer programs. They are defined by syntax and semantics, with some based on specifications (e.g., C) and others by dominant implementations (e.g., Perl).
    - Natural Language
        - Natural languages evolve naturally through human use and differ from artificial or constructed languages. They are systematic, conventional, redundant, and subject to change over time.
        - Natural Imprecision
            - Natural language reflects human cognition but includes vague terms like "tall" or "hot," challenging precise translation into computational reasoning.
- Linguistics
    - Linguistics
        - inguistics, the science of language, studies human communication, language families, and specific languages, involving subfields like phonetics, syntax, semantics, and sociolinguistics.
    - Applied Linguistics
        - Applied linguistics applies linguistic research to areas like education, translation, lexicography, language policy, and natural language processing.



$$\text{Language Analysis and Computational Linguistics}$$
 
- Language Analysis
    - **Purpose**: To understand how meaning is conveyed using language techniques (e.g., tone, word choice).
        - Identify techniques and their effects.

        ![image.png](attachment:image.png)

- Techniques
    - **Persuasive Techniques**: Analyze how these influence audience perception and response.

- Levels of Analysis
    - **Phonology**: Sound system of a language.
    - **Grammar (Morphology/Syntax)**: Structure of words and sentences.
    - **Discourse and Pragmatics**: Contextual and functional use of language.

- Genre and Audience
    - **Genre**: Groups texts by style and theme (e.g., fantasy, poetry).
    - **Audience**: Tailors content to engage target readers effectively.

- Foregrounding
    - **Attention-Getting Techniques**: Uses repetition (parallelism) or breaks patterns (deviation).

- Literariness
    - **Value in Texts**: Aesthetic and moral qualities elevate texts to literary works.

- Paradigm and Syntagm
    - **Paradigm**: Substitution relationships (e.g., noun for noun).
    - **Syntagm**: Positional relationships in sentence structure.

- Form and Function
    - **Form**: Identifies parts of speech and structures.
    - **Function**: Explains roles (nominal, adjectival, adverbial) in context.

- Linguistic Analysis
    - **Focus Areas**:
        - **Phonetics**: Studies sound production and perception.
        - **Phonology**: Explores sound systems.
        - **Morphology**: Investigates word structures.

            ![image-2.png](attachment:image-2.png)
        - **Syntax**: Examines sentence formation rules.

            ![image-3.png](attachment:image-3.png)
        - **Semantics**: Analyzes meaning.

            ![image-4.png](attachment:image-4.png)
        - **Pragmatics**: Studies social use of language.

            ![image-5.png](attachment:image-5.png)
- Lexicology
    - **Word Analysis**: Examines formation, usage, and relationships between words.

- Artificial Intelligence
    - **Definition**: Simulates human intelligence to mimic cognitive tasks.
        - **Strong AI**: Abstract reasoning and human-level thinking (future goal).
        - **Weak AI**: Pattern-based automation (e.g., driving, translation).

        ![image-6.png](attachment:image-6.png)

- AI Techniques
    - **Logic and Rules-Based**: Top-down rules for reasoning.
    - **Machine Learning**: Pattern detection for self-learning systems.
 
- Branches of Artificial Intelligence
    - Artificial Intelligence (AI) encompasses several key techniques and applications to solve real-world problems, including:  
        - Machine Learning  
        - Deep Learning  
        - Natural Language Processing  
        - Robotics  
        - Expert Systems  
        - Fuzzy Logic  


- Machine Learning (ML)
    - Machine learning, a subset of AI, enables systems to learn and improve from experience without explicit programming. ML systems analyze data, identify patterns, and make decisions with minimal human intervention. Its core goal is to allow computers to learn autonomously and refine their actions accordingly.
    - ML focuses on developing algorithms that transform data into intelligent action. These algorithms find applications in various domains, such as predictive modeling and decision-making.


- Deep Learning (DL)
    - Deep learning is a specialized subset of ML that uses neural networks with multiple layers. These networks mimic human brain function to process large datasets. Additional layers in DL models enhance prediction accuracy by optimizing hidden patterns.


- Robotics
    - Robotics integrates engineering and AI to design, manufacture, and operate robots. These intelligent machines can assist humans in tasks ranging from industrial automation to healthcare services. Forms of robotics include humanoid robots and software-based robotic process automation (RPA).


- Expert Systems
    - Expert systems simulate human expertise using AI technologies. They combine a knowledge base of facts and rules with inference engines to solve domain-specific problems. While they complement human experts, they are not designed to replace them.


- Fuzzy Logic
    - Fuzzy logic introduces degrees of truth to computing, as opposed to binary true/false logic. It allows systems to handle uncertainty and approximate reasoning effectively, particularly in control systems and decision-making applications.


- Natural Language Processing (NLP)

    ![image-8.png](attachment:image-8.png)
    - NLP enables computers to understand, interpret, and respond to human language. With applications in medical research, search engines, and business intelligence, NLP encompasses:  
    - **Natural Language Understanding (NLU):** Focused on interpreting input (text or speech) and identifying intents and entities.  
    - **Natural Language Generation (NLG):** NLG uses AI to generate written or spoken language from structured data. It includes processes like content analysis, data understanding, and grammatical structuring to produce human-like text. NLG is widely used for news reporting, customer messaging, and business content creation. 
        ![image-7.png](attachment:image-7.png)
    - **Applications:**  
        - **Interactive Voice Response (IVR):** Enhances customer service through voice-enabled systems.  
        - **Chatbots:** Automate customer support using predefined scripts.  
        - **Machine Translation:** Automates text translation using AI models.  
        - **Conversational Interfaces:** Powers devices like Amazon Alexa and Google Home.  




- Computational Linguistics
    - Computational Linguistics (CL) combines linguistics and computer science to analyze language. Applications include machine translation, speech recognition, text summarization, and building conversational agents. Approaches in CL include:  
    - **Corpus-based and Structural Approaches:** Analyze large language datasets.  
    - **Interactive Approaches:** Use text or speech inputs to generate responses.  
    - **Developmental Approaches:** Mimic language acquisition processes for learning over time.  
 

$$\text{Deep Parsing and Tools for NLP}$$

- Syntactic Parsing 
  - Syntax refers to the arrangement of words in sentences.
  - Syntax structures define parts of speech and sentence trees.
  - Syntax governs how sentences are structured in terms of noun, verb, and prepositional phrases.
  
  ![image-2.png](attachment:image-2.png)


- Syntactic Structure 
  - Sentence structure: Subject (NP) + Verb Phrase (VP) + Prepositional Phrase (PP).
  - Noun Phrase (NP): Determiner + Noun.
  - Verb Phrase (VP): Verb + combinations.
  - Prepositional Phrase (PP): Preposition + Noun Phrase.

- Examples 
  - *"The boy ate the pancakes"*:
    - The boy: NP, ate: Verb, the pancakes: NP.
  - *"The boy ate the pancakes under the door"*:
    - Syntactically correct, contextually incorrect.

- Text Syntax Components 
  - POS tags specify word functions (noun, verb, etc.).
  - Dependency grammar captures word relationships in sentences.

- Role of a Parser 

  ![image.png](attachment:image.png)
  - A parser checks syntax and builds a structure (e.g., parse tree).
  - It splits sentences into subjects and related phrases.


- Semantic Parsing 
  - Converts natural language into machine-understandable meaning.
  - Used in machine translation, QA, and code generation.

- Example 
  - *"The price of bananas increased by 5%"* — words like "increased" are predicates, and "the price of bananas" is an argument.


- Information Extraction 
  - Extracts relevant info from unstructured data.
  - Saves time and reduces human error.
  - Uses NLP algorithms for tasks like summarizing, extracting data from websites, etc.

- Web Scraping 
  - Collects raw data from websites using Python tools (e.g., urllib).
  - Be mindful of website terms and avoid overloading servers.

- Text Summarization 
  - Summarizes long texts into shorter, informative versions.
  - Helps save reading time and improve indexing.

  - Summarization Types 
    - **Input Type**: Single or multi-document.
    - **Purpose**: Generic, domain-specific, or query-based.
    - **Output**: 
      - Extractive: Extractive, where important sentences are selected from the input text to form a summary. 
      - Abstractive: where the model forms its own phrases and sentences to offer a more coherent summary like what a human would generate.
    
    ![image-3.png](attachment:image-3.png)
  
  - TextRank Algorithm 
    - Extractive summarization based on frequent words in sentences.
    
    ![image-4.png](attachment:image-4.png)
  
  - LexRank Algorithm 
    - Ranks sentences by similarity to others in the text.

  - Latent Semantic Analysis (LSA)
    - Latent Semantic Analysis is a unsupervised learning algorithm that can be used for extractive text summarization.
    - Uses singular value decomposition (SVD) for extractive summarization.
    
    ![image-5.png](attachment:image-5.png)
  
  - GPT Transformers 
    - Abstractive summarization using GPT-2 for generating human-like summaries.

    ![image-6.png](attachment:image-6.png)

- Anaphora Resolution 
  - Resolves pronouns or noun phrases (e.g., "He" refers to "John").

- Discourse Integration 
  - Examines how previous sentences affect the current sentence.
  - Example: the word “that” in the sentence “He wanted that” depends upon the prior discourse context.

- Pragmatic Analysis 
  - Interprets text meaning based on context and cooperation rules.
  - Example : “close the window?” should be interpreted as a request instead of an order.

- Ontology in NLP 
  - Formal representation of domain knowledge (concepts, relationships).
  - Enhances understanding of words and sentences.
  - Ontology Types 
    - **Domain-Specific**: Healthcare, finance, etc.
    - **General-Purpose**: Common concepts.
    - **Upper Ontologies**: Frameworks for specific ontologies.
  - Benefits of Ontologies 
    - Improves accuracy and disambiguation.
    - Facilitates information sharing and scalability.
    - Web Ontology Language (OWL)
      - Represents knowledge in machine-readable format (e.g., for e-commerce, healthcare, and research).

$$\text{Statistical Approaches}$$


- Probability
    - Probability means possibility. It is a branch of mathematics that deals with the occurrence of a random event.
    - The value is expressed from zero to one.
    - To find the `probability of a single event to occur, first, we should know the total number of possible outcomes`.
    - The tree diagram helps to organize and visualize the different possible outcomes. Branches and ends of the tree are two main positions.
    - Tree diagrams are used to figure out when to multiply and when to add.
        
        ![image.png](attachment:image.png)

    - There are three major types of probabilities:
        - Theoretical Probability
            - It is based on the possible chances of something to happen.
            - The theoretical probability is mainly based on the reasoning behind probability.
            - For example, if a coin is tossed, the theoretical probability of getting a head will be ½.
        - Experimental Probability
            - It is based on the basis of the observations of an experiment.
            - The experimental probability can be calculated based on the number of possible outcomes by the total number of trials.
            - For example, if a coin is tossed 10 times and head is recorded 6 times then, the experimental probability for heads is 6/10 or, 3/5.
        - Axiomatic Probability
            - In axiomatic probability, a set of rules or axioms are set which applies to all types.
            - These axioms are set by Kolmogorov and are known as Kolmogorov’s three axioms.
    - Conditional Probability
        - Conditional Probability is the likelihood of an event or outcome occurring based on the occurrence of a previous event or outcome.
    - Probability of an Event
        - Assume an event E can occur in r ways out of a sum of n probable or possible equally likely ways. Then the probability of happening of the event or its success is expressed as;
            - P(E) = r/n
        - The probability that the event will not occur or known as its failure is expressed as:
            - P(E’) = (n-r)/n = 1-(r/n)
        - E’ represents that the event will not occur.
        - Therefore, now we can say;
            - P(E) + P(E’) = 1
        - This means that the total of all the probabilities in any random test or experiment is equal to 1.
        - These hypotheses help form the probability in terms of a possibility space, which allows a measure holding values between 0 and 1.
        - This is known as the probability measure, to a set of possible outcomes of the sample space.
    - Probability Density Function (PDF)
        - The Probability Density Function (PDF) is the probability function which is represented for the density of a continuous random variable lying between a certain range of values.
        - Probability Density Function explains the normal distribution and how mean and deviation exists.
        - The standard normal distribution is used to create a database or statistics, which are often used in science to represent the real-valued variables, whose distribution is not known.


- Markov Model
    - A stochastic model representing systems that change over time probabilistically
    - Key feature is memorylessness: future state depends only on present state
    - Uses discrete state space and time with constant transition probabilities

- Applications and Types
    - Used in stock prices, weather patterns, queuing networks, and text generation
    - Hidden Markov Models (HMMs) include unobservable states
    - Markov Decision Processes (MDPs) incorporate decision-making elements

- Implementation Steps
    - Define possible system states and their characteristics
    - Calculate transition probabilities between states using matrices
    - Set initial state and analyze future state probabilities
    - Apply model for predictions and system analysis

- Text Classification
    - Process of organizing text data into defined groups
    - Applied in spam detection, sentiment analysis, and language detection
    - Uses both rule-based approaches and machine learning techniques

    ![image-2.png](attachment:image-2.png)

- Feature Extraction Methods
    - Bag of Words: Tracks word occurrence without considering order

    ![image-3.png](attachment:image-3.png)


    - N-grams: Captures word sequences (bigrams, trigrams) for context
    - TF*IDF: Weighs term importance based on frequency and document occurrence

        ![image-4.png](attachment:image-4.png)

- Clustering Techniques
    
    ![image-5.png](attachment:image-5.png)

    - K-means: Organizes data into disjoint clusters with high intra-cluster similarity
        
        ![image-6.png](attachment:image-6.png)

    - Hierarchical: Groups similar objects progressively into nested clusters

        ![image-9.png](attachment:image-9.png)

        ![image-10.png](attachment:image-10.png)

    - Different linkage methods: single, complete, average, and centroid

        ![image-8.png](attachment:image-8.png)
    

- Support Vector Machines (SVM)

    ![image-11.png](attachment:image-11.png)
    
    - Supervised learning algorithm for classification and regression
    - Works by finding optimal hyperplane to separate classes
    - Uses kernel tricks for non-linear classification problems
    - SVM Components
        - Support vectors: Points closest to hyperplane
        - Margin: Gap between separating lines
        - Kernels: Transform data into higher dimensions
            - Types include linear, polynomial, RBF, and sigmoid
            - RBF kernel maps input to infinite dimensional space

            ![image-12.png](attachment:image-12.png)

            ![image-13.png](attachment:image-13.png)

- Nearest Centroid Classification
    - Assigns observations to class with closest mean
    - Applied in text classification using tf*idf weights
    - Known as Rocchio classifier when used with word vectors

        ![image-14.png](attachment:image-14.png)

$$\text{Linear Algebra}$$

- Introduction to Linear Algebra

- Linear algebra is a fundamental branch of mathematics that deals with:
     - Linear equations and their transformations
     - Vector spaces
     - Matrices and their operations
     - Applications in various fields like computer science, engineering, and data science
          - Key Applications
               1. Computer Graphics: Transformations, rotations, and scaling of images
               2. Machine Learning: Data processing, neural networks, dimensionality reduction
               3. Engineering: System modeling and analysis
               4. Statistics: Data analysis and correlation studies
     - Linear Algebra in Machine Learning
          1. Dataset and Data Files
          2. Images and Photographs
          3. One Hot Encoding
          4. Linear Regression
          5. Regularization
          6. Principal Component Analysis
          7. Singular-Value Decomposition
          8. Latent Semantic Analysis
          9. Recommender Systems
          10. Deep Learning     

- Vectors
     - A vector is an ordered collection of numbers called scalars.
          - Example:
          ```python
          v = [3, -2, 1]  # 3-dimensional vector
          w = [4, 7]      # 2-dimensional vector
          ``` 

- Vector Operations
     1. Addition
     ```python
     a = [1, 2, 3]
     b = [4, 5, 6]
     c = a + b = [5, 7, 9]
     ``` 

     2. Subtraction
          ```python
          a = [4, 8, 2]
          b = [1, 3, 1]
          c = a - b = [3, 5, 1]
          ``` 
     5. Vector Multiplication
          ```python
          a = [4, 8, 2]
          b = [1, 3, 1]
          c = a*b = [4*1, 8*3, 2*1] = [4, 24, 2]
          ``` 

     5. Division
          ```python
          a = [4, 8, 2]
          b = [1, 3, 1]
          c = a/b = [4/1, 8/3, 2/1] = [4, 2, 2]
          ``` 

     5. Scalar Multiplication
          ```python
          v = [2, 3, 4]
          s = 2
          result = s * v = [4, 6, 8]
          ``` 

     6. Dot Product
          ```python
          a = [1, 2, 3]
          b = [4, 5, 6]
          dot_product = (1×4) + (2×5) + (3×6) = 32
          ```

- Vector Norms
     - L1 Norm (Manhattan)
          ```python
          v = [3, -4, 2]
          L1 = |3| + |-4| + |2| = 9
          ```

     - L2 Norm (Euclidean)
          ```python
          v = [3, 4]
          L2 = √(3² + 4²) = √25 = 5
          ```

- Matrices
     - A matrix is a 2D array of numbers arranged in rows and columns.
          - Example:
          ```python
          A = [
          [1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]
          ]
          ```
- Matrix Operations
     - Addition
          ```python
          A = [[1, 2],    B = [[5, 6],
               [3, 4]]         [7, 8]]
               
          C = A + B = [[6, 8],
                    [10, 12]]
          ```
     - Subtraction
          ```python
          A = [[1, 2],    B = [[5, 6],
               [3, 4]]         [7, 8]]
               
          C = A - B = [[-4, -4],
                    [-2, -2]]
          ```

     - Scalar Multiplication
          ```python
          A = [[1, 2],
               [3, 4]]
          s = 2

          Result = [[2, 4],
                    [6, 8]]
          ```

     - Matrix Multiplication
          ```python
          A = [[1, 2],    B = [[5, 6],
               [3, 4]]         [7, 8]]
               
          C = A × B = [[1*5, 3*6],    [1*7, 3*8] = [[19, 22],
                       [4*5, 4*6],    [4*7, 4*8]]   [43, 50]]
          ```

- Linear Equations
     - System of Linear Equations Example
          ```python
          2x + y = 5
          -x + y = 2

          Solution:
          x = 1
          y = 3
          ```

     - Matrix Form - The above system can be written as:
          ```python
          [2  1] [x] = [5]
          [-1 1] [y]   [2]
          ```

- Matrix Decomposition

     - LU Decomposition
          - Splits matrix A into Lower and Upper triangular matrices:
               ```python
               A = [[4, 3],
                    [6, 3]]

               L = [[1, 0],
                    [1.5, 1]]

               U = [[4, 3],
                    [0, -1.5]]
               ```
     - Eigendecomposition
          - For a matrix A:
               ```python
               A = [[4, 2],
                    [1, 3]]

               Eigenvalues: λ₁ = 5, λ₂ = 2
               Eigenvectors: v₁ = [2, 1], v₂ = [-1, 1]
               ```

- Practical Applications
     - Machine Learning Example
          - Linear regression in matrix form:
               ```python
               y = Xβ + ε
               where:
               y = target variable
               X = feature matrix
               β = coefficients
               ε = error term
               ```

     - Computer Graphics Example
          - 2D rotation matrix:
               ```python
               R(θ) = [[cos(θ), -sin(θ)],
                       [sin(θ),  cos(θ)]]
               For θ = 90°:
               R(90°) = [[0, -1],
                         [1,  0]]
               ```



$$\text{RNN, GRU and LSTM}$$

-  1. Recurrent Neural Networks (RNNs):
    - Purpose: Handle sequential data like time series, text, audio, or video by maintaining a "memory" of previous inputs.
    - Structure: Loops through input data, processing one element at a time and passing information forward.
    - Limitation: Vanishing gradient problem, which makes it hard to learn long-term dependencies.


-  2. Long Short-Term Memory Networks (LSTMs):
    - Purpose: Address the vanishing gradient problem in standard RNNs, enabling better handling of long-term dependencies.
    - Structure:
        - Incorporates cell states and gates (input, forget, and output gates).
        - Gates control how much information is retained, forgotten, or passed forward.
    - Use Case: Time-series forecasting, text generation, speech recognition.


-  3. Gated Recurrent Units (GRUs):
    - Purpose: Similar to LSTMs but more computationally efficient with fewer parameters.
    - Structure:
        - Combines input and forget gates into a update gate.
        - Uses a reset gate to manage short-term memory.
    - Use Case: Alternative to LSTMs when computational efficiency is crucial.


![image-3.png](attachment:image-3.png)

$$\text{Seq2Seq}$$


- Sequence-to-Sequence (Seq2Seq) Models:
    - Purpose: Map input sequences to output sequences of varying lengths, such as translation or summarization.
    - Structure:
    - Composed of two parts: Encoder and Decoder (both typically RNNs, LSTMs, or GRUs).
        - Encoder: Processes input sequence into a fixed-length context vector.
        - Decoder: Uses this context vector to generate the output sequence.
    - Enhancements:
    - Attention Mechanism: Allows the model to focus on specific parts of the input sequence while generating outputs, improving performance for longer sequences.


![image.png](attachment:image.png)




$$\text {Comparison Chart} $$

| **Model**       | **Handles Long-Term Dependencies?** | **Efficiency** | **Complexity** | **Use Case**                     |
|------------------|-------------------------------------|----------------|----------------|-----------------------------------|
| **RNN**          | No                                | High           | Simple         | Basic sequence processing         |
| **LSTM**         | Yes                               | Moderate       | Higher         | Time-series, text, long sequences |
| **GRU**          | Yes                               | Higher         | Moderate       | Efficient alternative to LSTMs    |
| **Seq2Seq**      | Yes                               | Moderate       | Complex        | Translation, summarization        |



$$\text{Word2Vec Models}$$


- Word Embedding
    - Represents words as vectors in a continuous vector space.
    - Captures semantic meaning; similar words are closer in space.
    - Techniques: Word2Vec (Google), GloVe (Stanford), FastText (Facebook).
    - Uses of Word Embedding
        1. Semantic Similarity: Group similar words, e.g., fruits like apple, mango.
        2. Text Classification: Converts text to numeric vectors for training.
        3. NLP Tasks: Used in clustering, sentiment analysis, POS tagging.



- Word2Vec
    - Learns vector representations of words by predicting word-context relationships.
    - Developed by Tomas Mikolov at Google.
    - Architectures:
        - CBOW: Predicts a word from its context.
        - Skip-Gram: Predicts context from a word.
        - Efficient, captures syntactic and semantic relationships.
            
            ![image.png](attachment:image.png)


- CBOW vs. Skip-Gram
    - CBOW: Faster, better for frequent words.
    - Skip-Gram: Handles rare words better, needs less training data.
        
        ![image-3.png](attachment:image-3.png)


- GloVe (Global Vectors for Word Representation)
    - Combines global co-occurrence statistics and word-context relationships.
    - Creates embeddings with meaningful linear structures in vector space.

        ![image-4.png](attachment:image-4.png)

- Seq2Seq
    - A model designed for mapping sequences to sequences (e.g., translation, summarization).
    - Comprises:
        - Encoder: Converts input sequence to a context vector.
        - Decoder: Generates output sequence from the context vector.
    - Often enhanced by Attention Mechanisms.

        ![image-2.png](attachment:image-2.png)

$$\text{Transformers}$$

- Transformers 

    - Seq2Seq Model Challenges
        - Struggles with long-range dependencies.
        - Sequential processing hinders parallelization.

    - Transformer Overview
        - Introduced in *"Attention Is All You Need"*.
        - Relies on self-attention, no RNNs or convolutions.
        - Efficient for sequence-to-sequence tasks.
        - Core Concepts
            - Uses attention mechanisms instead of sequential RNNs.
            - Focuses on relevant input parts for each token.
            - Highly parallelizable and efficient.

        - Workflow Highlights
            1. **Input Processing**: Text → Embeddings → Positional Encoding.
            2. **Encoder**: Self-attention + Feedforward Network.
            3. **Decoder**: Masked Self-attention + Cross-attention with the encoder.

        - Transformer Advantages
            - Self-attention for focus on different input parts.
            - Parallel processing boosts efficiency.
            - Handles longer sequences effectively.

        - Transformer Limitations
            - Fixed-length context leads to context fragmentation.

            ![image.png](attachment:image.png)

- Transformer-XL Enhancements
    - Addresses fixed-length limitations of standard Transformers.

    - Innovations
        1. **Segment-Level Recurrence**: Carries memory across segments for extended context.
        2. **Relative Positional Encoding**: Focuses on relationships between words.

    - Workflow Highlights
        - Processes input in segments with memory reuse.
        - Uses multi-head self-attention with recurrence.

    - Benefits
        - Handles long-term dependencies.
        - Avoids context fragmentation.

    - Limitations
        - Training and evaluation are computationally intensive.
    - Using Transformers for Language Modeling
        
        ![image-2.png](attachment:image-2.png)

        ![image-4.png](attachment:image-4.png)

$$\text{BERT}$$


- What is BERT?  
    - BERT (Bidirectional Encoder Representations from Transformers) is an open-source machine learning framework for NLP developed by Google.
        - It helps computers understand ambiguous language by using context from surrounding text.
        - BERT uses Transformers, a type of deep learning model with an attention mechanism that dynamically calculates connections between input and output elements.
    - Key Features of BERT
        1. Bidirectionality: 
        - Unlike previous models that read text sequentially (left-to-right or right-to-left), BERT processes text in both directions simultaneously.
        2. Pre-training Tasks:
        - Masked Language Modeling (MLM): Predicts randomly masked words in a sentence.
        - Next Sentence Prediction (NSP): Determines if a sentence logically follows another.
    - Background
        - Transformers:
            - Introduced by Google in 2017.
            - Improved over RNNs and CNNs because Transformers can process data in any order.
                ![image.png](attachment:image.png)
        - Pre-training Data:
            - BERT was trained on large datasets, including:
                - Wikipedia: 2,500 million words.
                - BookCorpus: 800 million words.
    - Architecture
        1. Variants:
            - BERT Base: 12 layers, 12 attention heads, 110 million parameters.
            - BERT Large: 24 layers, 16 attention heads, 340 million parameters.
        2. Input Embeddings:
            - Position Embeddings: Encodes word positions.
            - Segment Embeddings: Distinguishes between sentence pairs.
            - Token Embeddings: Represents individual words or subwords.
    - Text Processing
        ![image-2.png](attachment:image-2.png)
    - Applications and Pre-trained Models
        - BERT has been fine-tuned for various specialized tasks:
            - PatentBERT: Patent classification.
            - BioBERT: Biomedical text mining.
            - VideoBERT: Combines visual and linguistic data for video processing.
            - SciBERT: Focused on scientific texts.
            - DistilBERT: A smaller and faster variant by HuggingFace.
            - TinyBERT: Optimized for efficiency by Huawei.

    - Significance
        - BERT is particularly effective in handling polysemy (words with multiple meanings) and ambiguity in text.
        - It has achieved breakthroughs in various NLP tasks like sentiment analysis, sentence classification, and semantic role labeling.





$$\text{Speech Processing}$$

- Articulatory Phonetics  
    -Speech production involves:  
        1. Initiation: Air motion in the vocal tract.  
        2. Phonation: Airflow modification in the larynx.  
        3. Articulation: Air shaping for specific sounds.  

    - Key factors in consonant production:  
        - Voice: Voiced (e.g., /b,d,v,m/) vs. Voiceless (e.g., /s,t,p,f/).  
        - Place: E.g., bilabial (/p,b/), dental (/θ,ð/), alveolar (/t,d/).  
        - Manner: E.g., stops (/p,b/), nasals (/m,n/), fricatives (/f,v/), approximants (/w,j/).

    - Vowels  
        - Vertical Tongue Position: Close (e.g., /ɪ/) vs. Open (e.g., /æ/).  
        - Horizontal Position: Front (/e/), Mid (/ə/), Back (/ʌ/).  
        - Lip Shape: Rounded (/u/), Spread (/ɪ/).  

    Types:  
        - Monophthongs: Stable vowels (e.g., /e/).  
        - Diphthongs: Dynamic vowels (e.g., /aʊ/).


- Acoustic Phonetics  
    - Analyzes physical speech properties (e.g., frequency, amplitude) via waveforms and sound waves.

        ![image.png](attachment:image.png)

        ![image-2.png](attachment:image-2.png)

- Phonology  
    - Studies language-specific sound systems and their rules.


- Computational Phonology  
    - Uses computational techniques for phonological modeling and sound pattern analysis.


- Digital Signal Processing (DSP)  
    - Processes real-world signals (audio, video) for analysis and conversion. Components:  
    - Program Memory, Data Memory, Compute Engine, I/O.

        ![image-3.png](attachment:image-3.png)


- Automatic Speech Recognition (ASR)  
    - Converts speech to text using:  
        1. Traditional Approach: HMM/GMM models (time-intensive).  
            ![image-4.png](attachment:image-4.png)
        2. End-to-End Deep Learning: Direct mapping with architectures like CTC, LAS, and RNNT.

    - Applications: Telephony, video platforms, virtual meetings.


- Text-to-Speech (TTS)  
    - Converts text into speech using text analysis and DSP. Advanced systems include personalized conversational AI like Alexa.

       ![image-5.png](attachment:image-5.png)

- Speech Synthesis  
    - Produces artificial speech via concatenation (recorded speech) or vocal tract modeling. Applications include TTS and speech recognition.

        ![image-6.png](attachment:image-6.png)


- Language Models  
    - Predict word sequences in speech, enhancing ASR and TTS performance.
 

$$\text{NLP Applications}$$
 
- Lexicon
    - A lexicon is the vocabulary of a language or knowledge domain, cataloging its words or lexemes.
    - Derived from the Greek word *lexikon*, meaning "of or for words."
    - Constitutes one part of a language, alongside grammar, which provides rules for word combinations.
- Dictionary
    - Lists lexemes, often alphabetically or by character stroke.
    - Includes definitions, etymologies, pronunciations, translations, etc.
- Thesaurus
    - A reference for finding synonyms and antonyms of words.
    - Aids writers in selecting precise words for ideas.
- Transliteration
    - Converts text phonetically from one writing system to another.
    - Example: Typing "namaste" in Latin letters converts to "नमस्ते" in Hindi.
    - Distinct from translation, focusing on pronunciation rather than meaning.
- Spell Checker and Auto-Correct
    - Detects and corrects spelling errors using:
    - Levenshtein Automaton: For generating correction candidates.
    - Neural Language Models: For ranking corrections.
    - Commonly pre-trained and ready for deployment.
- Connected Applications
    1. Grammar Checker: Ensures grammatical correctness in text.
    2. Domain Classification: Utilizes ML for content categorization.
    3. Language Identification: Employs ML/DL methods to detect language.
    4. Auto-suggest/Complete: Predicts next words using RNNs.
- Machine Translation (MT)
    - Automates translation between languages.
    - Neural Machine Translation (NMT):
    - Uses neural networks to model sentence sequences.
    - Trains end-to-end for higher efficiency and accuracy.
    - Examples: Baidu's system (2015), Google NMT (2016).
- Information Extraction
    - Converts unstructured data into structured formats using NLP algorithms.
    - Techniques include Named Entity Recognition (NER) and other deep learning methods.
    - Applications:
    - Summarizing large text collections.
    - Conversational AI (e.g., chatbots).
    - Extracting stock market data or medical records.
- Web Scraping
    - Collects and parses raw web data.
    - Commonly used in data science, business intelligence, and investigative research.
- Question Answering (QA)
    - Requires understanding questions in conversational contexts.
    - Decomposed into:
    - Question Rewriting.
    - Question Answering sub-tasks.
- Speech Technologies
    - Enables devices to recognize and analyze spoken words or audio.
    - Relies on signal processing and machine learning for tasks like speaker identification and noise reduction.
- OCR (Optical Character Recognition)
    - Converts images of text (printed, handwritten, or typed) into machine-readable text.
    - Applied to scanned documents, photos, subtitles, etc.
- Chatbots
    - Simulate human-like conversation through text or speech.
    - Commonly used in customer support and require regular updates to improve interaction quality.

