# Model 1: Rule-Based Approach with Sliding Window + Beam Search
This model performs emoji prediction by leveraging emotion lexicons (positive and negative), a sarcasm keyword dictionary, and a sliding window to capture contextual phrases. Beam search is applied to generate and rank possible emoji sequences, selecting the one with the highest cumulative score based on lexicon matches and context relevance.

In [1]:
#model 1

import re

# Define lexicon mapping words to emoji sentiment scores
emotion_lexicon = {
    "love": {"ðŸ˜Š": 2},
    "like": {"ðŸ˜Š": 1},
    "happy": {"ðŸ˜Š": 2},
    "joy": {"ðŸ˜Š": 2},
    "amazing": {"ðŸ˜Š": 2},

    "sad": {"ðŸ˜”": 2},
    "disappointed": {"ðŸ˜”": 2},
    "crying": {"ðŸ˜”": 3},
    "failed": {"ðŸ˜”": 2},

    "disgusting": {"ðŸ¤¢": 3},
    "gross": {"ðŸ¤¢": 2},
    "yuck": {"ðŸ¤¢": 3},
    "hate": {"ðŸ¤¢": 1, "ðŸ˜ ": 1},

    "afraid": {"ðŸ˜¨": 2},
    "scared": {"ðŸ˜¨": 2},
    "terrified": {"ðŸ˜¨": 3},
    "nightmare": {"ðŸ˜¨": 2},

    "angry": {"ðŸ˜ ": 3},
    "mad": {"ðŸ˜ ": 2},
    "furious": {"ðŸ˜ ": 3}
}

# Words indicating negation
negation_words = ["not", "don't", "didn't", "never", "no"]

# Preprocessing: Clean and tokenize
def preprocess(sentence):
    sentence = sentence.lower()
    sentence = re.sub(r"[^a-zA-Z0-9\s']", "", sentence)
    tokens = sentence.split()
    return tokens

# Simple rule-based POS tagging
def simple_pos_tag(tokens):
    negation_words = {"not", "no", "don't", "didn't", "isn't", "wasn't", "won't", "can't", "couldn't"}
    tagged = []
    for i, word in enumerate(tokens):
        if word in negation_words:
            tagged.append((word, "NEG"))
        else:
            tagged.append((word, "WORD"))
    return tagged


# Emotion scoring logic with negation handling
def get_emotion_scores(pos_tags):
    scores = {"ðŸ˜Š": 0, "ðŸ˜”": 0, "ðŸ¤¢": 0, "ðŸ˜¨": 0, "ðŸ˜ ": 0}
    negate = False
    for word, tag in pos_tags:
        if tag == "NEG":
            negate = True
            continue
        if word in emotion_lexicon:
            lex = emotion_lexicon[word]
            if negate and "negated" in lex:
                for emoji, val in lex["negated"].items():
                    scores[emoji] += val
                negate = False  # reset after using
            elif not negate:
                for emoji, val in lex.items():
                    if emoji != "negated":
                        scores[emoji] += val
    return scores


# Beam search: pick top-k emojis
def beam_search(scores, k=3):
    sorted_emojis = sorted(scores.items(), key=lambda x: x[1], reverse=True)
    return sorted_emojis[:k]

# Final prediction function
def predict_emoji(sentence):
    tokens = preprocess(sentence)
    pos_tags = simple_pos_tag(tokens)
    scores = get_emotion_scores(pos_tags)
    top_emojis = beam_search(scores)
    return top_emojis[0][0] if top_emojis[0][1] > 0 else "ðŸ¤”"  # fallback emoji

# Sample sentences to test
test_sentences = [
    "I love pizza",
    "I hate this food",
    "She got me a burger",
    "I don't like burgers",
    "This is disgusting",
    "I'm scared of the dark",
    "He failed his test again",
    "Wow, that went great!",
    "What a nightmare",
    "Iâ€™m not happy with the results"
]

# Print predictions
for sentence in test_sentences:
    print(f"{sentence} â†’ {predict_emoji(sentence)}")


I love pizza â†’ ðŸ˜Š
I hate this food â†’ ðŸ¤¢
She got me a burger â†’ ðŸ¤”
I don't like burgers â†’ ðŸ¤”
This is disgusting â†’ ðŸ¤¢
I'm scared of the dark â†’ ðŸ˜¨
He failed his test again â†’ ðŸ˜”
Wow, that went great! â†’ ðŸ¤”
What a nightmare â†’ ðŸ˜¨
Iâ€™m not happy with the results â†’ ðŸ¤”


### Evaluation: Model 1
The model effectively identifies basic emotional cues and works well on single-clause inputs. However, it has difficulty with complex sentence structures, especially in the presence of negation or sarcasm. The rule-based nature ensures interpretability but lacks adaptability across varied sentence constructions.

# [IMPROVED] Model 2: Enhanced Rule-Based Approach with Parse Trees & Scoped Negation
Building upon the first model, this version integrates syntactic parsing using dependency trees, POS tagging, and emotional phrase chunking. Scoped negation is handled using dependency relations, and emotional intensity is modulated through weighted lexicons. These additions allow the model to better disambiguate emotional content in complex or sarcastic statements.

In [None]:
#model 2 

import re
import nltk
from nltk import pos_tag, word_tokenize
from nltk.tree import Tree
from nltk.chunk import RegexpParser


# Emotion to Emoji Mapping
emoji_map = {
    'happy': 'ðŸ˜Š',
    'sad': 'ðŸ˜”',
    'disgust': 'ðŸ¤¢',
    'fear': 'ðŸ˜¨',
    'angry': 'ðŸ˜ '
}

# Emotion Lexicon with weights
emotion_lexicon = {
    'happy': {'love': 2, 'like': 1, 'joy': 2, 'delicious': 1, 'great': 1, 'happy': 2, 'excited': 2},
    'sad': {'sad': 2, 'failed': 1, 'crying': 2, 'regret': 1, 'unhappy': 2, 'disappointed': 1},
    'disgust': {'hate': 2, 'disgusting': 3, 'gross': 2, 'yuck': 1, 'nasty': 2},
    'fear': {'scared': 2, 'afraid': 1, 'terrified': 3, 'nightmare': 2, 'horror': 2, 'panic': 1},
    'angry': {'angry': 2, 'furious': 3, 'mad': 2, 'annoyed': 1, 'rage': 3}
}

negations = {"not", "no", "never", "don't", "didn't", "isn't", "wasn't", "aren't", "can't", "won't"}

# Sentences to test
sentences = [
    "I love pizza",
    "I hate this food",
    "She got me a burger",
    "I don't like burgers",
    "This is disgusting",
    "I'm scared of the dark",
    "He failed his test again",
    "Wow, that went great!",
    "What a nightmare",
    "Iâ€™m not happy with the results"
]

# Preprocessing
def preprocess(sentence):
    sentence = sentence.lower()
    sentence = re.sub(r"[^\w\s']", "", sentence)
    return sentence.split()

# Use parse tree to extract relevant chunks (noun/adjective phrases)
def get_phrases(sentence):
    tokens = word_tokenize(sentence)
    tagged = pos_tag(tokens)

    grammar = r"""
        NP: {<DT>?<JJ.*>*<NN.*>+}       # Noun phrases
        ADJP: {<RB.?>*<JJ>}             # Adjective phrases
    """
    parser = RegexpParser(grammar)
    tree = parser.parse(tagged)

    key_chunks = []
    for subtree in tree:
        if isinstance(subtree, Tree):
            phrase = " ".join(word for word, tag in subtree.leaves())
            key_chunks.append(phrase.lower())
    return key_chunks

# Scoring with negation, weights, and phrase importance
def score_sentence(sentence):
    words = preprocess(sentence)
    phrases = get_phrases(sentence)
    score = {emotion: 0 for emotion in emoji_map}

    for i, word in enumerate(words):
        is_negated = False
        for offset in range(1, 4):
            if i - offset >= 0 and words[i - offset] in negations:
                is_negated = True
                break

        for emotion, keywords in emotion_lexicon.items():
            if word in keywords:
                base_weight = keywords[word]
                if any(word in phrase for phrase in phrases):
                    base_weight += 1  # boost for being in a noun/adj phrase
                if is_negated:
                    if emotion == 'happy':
                        score['sad'] += base_weight
                    elif emotion == 'sad':
                        score['happy'] += base_weight
                    elif emotion == 'disgust':
                        score['happy'] += base_weight
                    else:
                        score[emotion] -= base_weight
                else:
                    score[emotion] += base_weight
    return score

# Predict using max score
def predict_emoji(sentence):
    scores = score_sentence(sentence)
    best_emotion = max(scores, key=scores.get)
    return emoji_map.get(best_emotion, 'ðŸ¤”')

# Output
for s in sentences:
    print(f"{s} â†’ {predict_emoji(s)}")


I love pizza â†’ ðŸ˜Š
I hate this food â†’ ðŸ¤¢
She got me a burger â†’ ðŸ˜Š
I don't like burgers â†’ ðŸ˜”
This is disgusting â†’ ðŸ¤¢
I'm scared of the dark â†’ ðŸ˜¨
He failed his test again â†’ ðŸ˜”
Wow, that went great! â†’ ðŸ˜Š
What a nightmare â†’ ðŸ˜¨
Iâ€™m not happy with the results â†’ ðŸ˜”


### Evaluation: Model 2
This enhanced approach provides more accurate predictions, especially in inputs containing sarcasm, logical inversions, or multiple emotional clauses. Although it introduces additional computational overhead due to parse tree generation, the trade-off results in significantly improved contextual understanding and prediction accuracy compared to the first model.

## Final Conclusion
The transition from a basic lexicon-based method to a syntactically aware rule-based system improves both precision and context handling in emoji prediction. While the current framework remains interpretable and domain-specific, it highlights the limitations of rule-based NLP for broader generalization. Future work includes implementing a data-driven architecture, such as RNNs or transformer-based models, trained on large-scale tweet-emoji datasets to enhance generalization, sarcasm detection, and emotional nuance.