# **Teste de Hipótese para análise da eficácia um Algoritmo de Recomendação de Filmes por meio da Similaridade do Cosseno** 🎥

In [29]:
import pandas as pd
import numpy as np
import math
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from scipy.stats import ttest_1samp

# Função para calcular o ângulo do cosseno
def calcular_angulo_cosseno(vetor1, vetor2):
    produto_interno = np.dot(vetor1, vetor2)
    norma_vetor1 = np.linalg.norm(vetor1)
    norma_vetor2 = np.linalg.norm(vetor2)
    return math.degrees(math.acos(produto_interno / (norma_vetor1 * norma_vetor2)))

# Carregando a Base de Dados
base = pd.read_csv('movies_metadata.csv', low_memory=False)
base = base[base['original_language'] == 'en']
base = base[base['vote_count'] > 999]
base = base[['id', 'original_title', 'genres', 'overview']]
base.rename(columns={'id': 'ID', 'original_title': 'TITLE', 'genres': 'GENRES', 'overview': 'SYNOPSIS'}, inplace=True)
base.dropna(inplace=True)

# Removendo as "Stopwords"
vectorizer = TfidfVectorizer(stop_words=('english'))
dados = base[['TITLE', 'SYNOPSIS']]
texto = dados.apply(lambda x: ''.join(x.astype(str)), axis=1)

# Criando um Objeto TF-IDF Vectorizer
vetorizador = TfidfVectorizer()
tfidf_matrix = vetorizador.fit_transform(texto)

# Lista de sinopses dos filmes
sinopses = [
    "It's Ted the Bellhop's first night on the job...and the hotel's very unusual guests are about to place him in some outrageous predicaments. It seems that this evening's room service is serving up one unbelievable happening after another.",
    "While racing to a boxing match, Frank, Mike, John and Rey get more than they bargained for. A wrong turn lands them directly in the path of Fallon, a vicious, wise-cracking drug lord. After accidentally witnessing Fallon murder a disloyal henchman, the four become his unwilling prey in a savage game of cat & mouse as they are mercilessly stalked through the urban jungle in this taut suspense drama",
    "Timo Novotny labels his new project an experimental music documentary film, in a remix of the celebrated film Megacities (1997), a visually refined essay on the hidden faces of several world ""megacities"" by leading Austrian documentarist Michael Glawogger. Novotny complements 30 % of material taken straight from the film (and re-edited) with 70 % as yet unseen footage in which he blends original shots unused by Glawogger with his own sequences (shot by Megacities cameraman Wolfgang Thaler) from Tokyo. Alongside the Japanese metropolis, Life in Loops takes us right into the atmosphere of Mexico City, New York, Moscow and Bombay. This electrifying combination of fascinating film images and an equally compelling soundtrack from Sofa Surfers sets us off on a stunning audiovisual adventure across the continents. The film also makes an original contribution to the discussion on new trends in documentary filmmaking. Written by KARLOVY VARY IFF 2006",
    "Princess Leia is captured and held hostage by the evil Imperial forces in their effort to take over the galactic Empire. Venturesome Luke Skywalker and dashing captain Han Solo team together with the loveable robot duo R2-D2 and C-3PO to rescue the beautiful princess and restore peace and justice in the Empire.",
    "Nemo, an adventurous young clownfish, is unexpectedly taken from his Great Barrier Reef home to a dentist's office aquarium. It's up to his worrisome father Marlin and a friendly but forgetful fish Dory to bring Nemo home -- meeting vegetarian sharks, surfer dude turtles, hypnotic jellyfish, hungry seagulls, and more along the way.",
    "A man with a low IQ has accomplished great things in his life and been present during significant historic eventsâ€”in each case, far exceeding what anyone imagined he could do. But despite all he has achieved, his one true love eludes him.",
    "Lester Burnham, a depressed suburban father in a mid-life crisis, decides to turn his hectic life around after developing an infatuation with his daughter's attractive friend.",
    "Newspaper magnate, Charles Foster Kane is taken from his mother as a boy and made the ward of a rich industrialist. As a result, every well-meaning, tyrannical or self-destructive move he makes for the rest of his life appears in some way to be a reaction to that deeply wounding event.",
    "Selma, a Czech immigrant on the verge of blindness, struggles to make ends meet for herself and her son, who has inherited the same genetic disorder and will suffer the same fate without an expensive operation. When life gets too difficult, Selma learns to cope through her love of musicals, escaping life's troubles - even if just for a moment - by dreaming up little numbers to the rhythmic beats of her surroundings.",
    "AdÃ¨le and her daughter Sarah are traveling on the Welsh coastline to see her husband James when Sarah disappears. A different but similar looking girl appears who says she died in a past time. AdÃ¨le tries to discover what happened to her daughter as she is tormented by Celtic mythology from the past.",
    "In 2257, a taxi driver is unintentionally given the task of saving a young girl who is part of the key that will ensure the survival of humanity.",
    "A fatally ill mother with only two months to live creates a list of things she wants to do before she dies without telling her family of her illness.",
    "Bruce Brown's The Endless Summer is one of the first and most influential surf movies of all time. The film documents American surfers Mike Hynson and Robert August as they travel the world during Californiaâ€™s winter (which, back in 1965 was off-season for surfing) in search of the perfect wave and ultimately, an endless summer.",
    "Jack Sparrow, a freewheeling 18th-century pirate, quarrels with a rival pirate bent on pillaging Port Royal. When the governor's daughter is kidnapped, Sparrow decides to help the girl's love save her.",
    "An assassin is shot by her ruthless employer, Bill, and other members of their assassination circle â€“ but she lives to plot her vengeance.",
    "Jarhead is a film about a US Marine Anthony Swoffordâ€™s experience in the Gulf War. After putting up with an arduous boot camp, Swofford and his unit are sent to the Persian Gulf where they are eager to fight, but are forced to stay back from the action. Swofford struggles with the possibility of his girlfriend cheating on him, and as his mental state deteriorates, his desire to kill increases.",
    "Matt, a young glaciologist, soars across the vast, silent, icebound immensities of the South Pole as he recalls his love affair with Lisa. They meet at a mobbed rock concert in a vast music hall - London's Brixton Academy. They are in bed at night's end. Together, over a period of several months, they pursue a mutual sexual passion whose inevitable stages unfold in counterpoint to nine live-concert songs.",
    "At the height of the Vietnam war, Captain Benjamin Willard is sent on a dangerous mission that, officially, ""does not exist, nor will it ever exist."" His goal is to locate - and eliminate - a mysterious Green Beret Colonel named Walter Kurtz, who has been leading his personal army on illegal guerrilla missions into enemy territory.",
    "William Munny is a retired, once-ruthless killer turned gentle widower and hog farmer. To help support his two motherless children, he accepts one last bounty-hunter mission to find the men who brutalized a prostitute. Joined by his former partner and a cocky greenhorn, he takes on a corrupt sheriff.",
    "After Homer accidentally pollutes the town's water supply, Springfield is encased in a gigantic dome by the EPA and the Simpsons are declared fugitives.",
    "Joel Barish, heartbroken that his girlfriend underwent a procedure to erase him from her memory, decides to do the same. However, as he watches his memories of her fade away, he realises that he still loves her, and may be too late to correct his mistake.",
    "Captain Jack Sparrow works his way out of a blood debt with the ghostly Davy Jones to avoid eternal damnation.",
    "An average family is thrust into the spotlight after the father commits a seemingly self-defense murder at his diner.",
    "Humanity finds a mysterious object buried beneath the lunar surface and sets off to find its origins with the help of HAL 9000, the world's most advanced super computer.",
    "In the year 2035, convict James Cole reluctantly volunteers to be sent back in time to discover the origin of a deadly virus that wiped out nearly all of the earth's population and forced the survivors into underground communities. But when Cole is mistakenly sent to 1990 instead of 1996, he's arrested and locked up in a mental hospital. There he meets psychiatrist Dr. Kathryn Railly, and patient Jeffrey Goines, the son of a famous virus expert, who may hold the key to the mysterious rogue group, the Army of the 12 Monkeys, thought to be responsible for unleashing the killer disease.",
    "The setting is Detroit in 1995. The city is divided by 8 Mile, a road that splits the town in half along racial lines. A young white rapper, Jimmy ""B-Rabbit"" Smith Jr. summons strength within himself to cross over these arbitrary boundaries to fulfill his dream of success in hip hop. With his pal Future and the three one third in place, all he has to do is not choke.",
    "A master thief coincidentally is robbing a house where a murderâ€”in which the President of The United States is involvedâ€”occurs in front of his eyes. He is forced to run, while holding evidence that could convict the President.",
    "Two childhood friends are recruited for a suicide bombing in Tel Aviv.",
    "Low-level bureaucrat Sam Lowry escapes the monotony of his day-to-day life through a recurring daydream of himself as a virtuous hero saving a beautiful damsel. Investigating a case that led to the wrongful arrest and eventual death of an innocent man instead of wanted terrorist Harry Tuttle, he meets the woman from his daydream, and in trying to help her gets caught in a web of mistaken identities, mindless bureaucracy and lies.",
    "A chronicle of country music legend Johnny Cash's life, from his early days on an Arkansas cotton farm to his rise to fame with Sun Records in Memphis, where he recorded alongside Elvis Presley, Jerry Lee Lewis and Carl Perkins.",
    "Despondent over a painful estrangement from his daughter, trainer Frankie Dunn isn't prepared for boxer Maggie Fitzgerald to enter his life. But Maggie's determined to go pro and to convince Dunn and his cohort to help her.",
    "Set against the background of the 1984 Miners' Strike, 11-year-old Billy Elliot stumbles out of the boxing ring and onto the ballet floor. He faces many trials and triumphs as he strives to conquer his family's set ways, inner conflict, and standing on his toes.",
    "Derek Vineyard is paroled after serving 3 years in prison for killing two African-American men. Through his brother, Danny Vineyard's narration, we learn that before going to prison, Derek was a skinhead and the leader of a violent white supremacist gang that committed acts of racial crime throughout L.A. and his actions greatly influenced Danny. Reformed and fresh out of prison, Derek severs contact with the gang and becomes determined to keep Danny from going down the same violent path as he did.",
    "Ray Ferrier is a divorced dockworker and less-than-perfect father. Soon after his ex-wife and her new husband drop off his teenage son and young daughter for a rare weekend visit, a strange and powerful lightning storm touches down.",
    "'We come in peace' is not what those green men from Mars mean when they invade our planet, armed with irresistible weapons and a cruel sense of humor.  This star studded cast must play victim to the alienâ€™s fun and games in this comedy homage to science fiction films of the '50s and '60s.",
    "On his way to Vienna, American Jesse meets CÃ©line, a student returning to Paris. After long conversations forge a surprising connection between them, Jesse convinces Celine to get off the train with him in Vienna. Since his flight to the U.S. departs the next morning and he has no money for lodging, they wander the city together, taking in the experiences of Vienna and each other.",
    "Leonard Shelby is tracking down the man who raped and murdered his wife. The difficulty of locating his wife's killer, however, is compounded by the fact that he suffers from a rare, untreatable form of short-term memory loss. Although he can recall details of life before his accident, Leonard cannot remember what happened fifteen minutes ago, where he's going, or why.",
    "In the smog-choked dystopian Los Angeles of 2019, blade runner Rick Deckard is called out of retirement to terminate a quartet of replicants who have escaped to Earth seeking their creator for a way to extend their short life spans.",
    "Nine years later, Jesse travels across Europe giving readings from a book he wrote about the night he spent in Vienna with Celine. After his reading in Paris, Celine finds him, and they spend part of the day together before Jesse has to again leave for a flight. They are both in relationships now, and Jesse has a son, but as their strong feelings for each other start to return, both confess a longing for more.",
    "A case involving drug lords and murder in South Florida takes a personal turn for undercover detectives Sonny Crockett and Ricardo Tubbs. Unorthodox Crockett gets involved romantically with the Chinese-Cuban wife of a trafficker of arms and drugs, while Tubbs deals with an assault on those he loves."
]

# Lista para armazenar os ângulos do cosseno
angulos_cosseno = []

# Consulta
query = "A man with a low IQ has accomplished great things in his life and been present during significant historic eventsâ€”in each case, far exceeding what anyone imagined he could do. But despite all he has achieved, his one true love eludes him."

# Transformando a consulta em Vetor Numérico
query_vetor = vetorizador.transform([query])

# Calculando a Similaridade do Cosseno entre a consulta e as sinopses
for sinopse in sinopses:
    # Transformando a sinopse em Vetor Numérico
    sinopse_vetor = vetorizador.transform([sinopse])

    # Calculando a Similaridade do Cosseno entre a consulta e a sinopse
    similaridade = cosine_similarity(query_vetor, sinopse_vetor)

    # Calculando o ângulo do cosseno
    angulo = calcular_angulo_cosseno(query_vetor.toarray()[0], sinopse_vetor.toarray()[0])

    # Armazenando o ângulo do cosseno
    angulos_cosseno.append(angulo)


# Exibindo a lista de ângulos do cosseno
print("Lista de Ângulos do Cosseno:", angulos_cosseno)

# Calculando a média e o desvio padrão dos ângulos do cosseno
media_amostra = np.mean(angulos_cosseno)
desvio_padrao_amostra = np.std(angulos_cosseno)

# Número de observações na amostra
n = len(angulos_cosseno)

# Valor esperado (85) sob a hipótese
valor_esperado = 85

# Calculando o t-score e o p-valor
t_score, p_valor = ttest_1samp(angulos_cosseno, valor_esperado)

# Imprimindo o t-score e o p-valor
print("\nTeste de Hipótese:")
print("T-score:", t_score)
print("P-valor:", p_valor)

# Comparando o p-valor com o nível de significância α = 0.05
# Interpretando o resultado do teste de hipótese
if p_valor < 0.05:
    print("\nRejeitamos a hipótese nula. Há evidências estatísticas para suportar a hipótese alternativa.")
    print("Interpretação: A média dos ângulos do cosseno é significativamente diferente de 85 graus.")
    print("Isso sugere que o algoritmo de recomendação está efetivamente identificando filmes com sinopses similares àquelas inseridas pelos usuários.")
    print("Quanto maior o ângulo entre dois vetores de sinopse (valores entre 0º e 180º), menor é o cosseno, indicando maior similaridade entre as sinopses.")
else:
    print("\nNão há evidências suficientes para rejeitar a hipótese nula.")
    print("Interpretação: A média dos ângulos do cosseno não difere significativamente de 85 graus.")
    print("Isso sugere que o algoritmo de recomendação pode não estar tão eficaz quanto o esperado em identificar filmes com sinopses semelhantes.")
    print("É possível que as sinopses inseridas pelos usuários não estejam bem representadas na base de dados ou que o algoritmo precise de ajustes para melhorar sua precisão.")


Lista de Ângulos do Cosseno: [88.29624805234207, 88.7975886924364, 87.92000977601451, 88.8762746073045, 87.85225920237242, 0.0, 87.0989776151731, 87.86812767706915, 87.62258338096439, 88.50069047147007, 89.5743887556667, 86.18459764663562, 86.96584262896074, 88.94781379211545, 89.32904270690645, 87.08580107238481, 87.53986127724517, 87.64749767082307, 87.41202127080135, 89.51543292990456, 85.43451502971841, 89.09263767509505, 89.43468229004449, 89.5888330104499, 87.84507402303464, 84.97788316314963, 86.34167117542114, 89.28400018578306, 83.0367840101053, 87.14924312360188, 87.9621233371351, 87.82433555694593, 88.66633162495782, 88.76369295595076, 88.16216610817024, 84.64452878908502, 85.81913623410544, 89.20288538009423, 83.5066457490418, 86.55145108071612]

Teste de Hipótese:
T-score: 0.18509166303623634
P-valor: 0.8541163654992798

Não há evidências suficientes para rejeitar a hipótese nula.
Interpretação: A média dos ângulos do cosseno não difere significativamente de 85 graus.
Isso