<a href="https://colab.research.google.com/github/seymakayaa/Ielts_Writing_Predictive/blob/main/ielts_predict.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [26]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import files
import plotly.express as px
import matplotlib.pyplot as plt
from collections import Counter
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error


In [2]:
data = pd.read_csv("ielts_writing_dataset.csv")

In [16]:
data.shape

(1435, 9)

In [17]:
data.columns

Index(['Task_Type', 'Question', 'Essay', 'Examiner_Commen', 'Task_Response',
       'Coherence_Cohesion', 'Lexical_Resource', 'Range_Accuracy', 'Overall'],
      dtype='object')

In [11]:
texts = data['Question']
word_counter = Counter()
for text in texts:
    words = text.split()  # Metni kelimelere ayır
    word_counter.update(words)

print(words)

['Modern', 'medicine', 'helps', 'to', 'live', 'a', 'longer', 'life.', 'Do', 'you', 'agree?']


In [12]:

# En sık kullanılan kelimeleri ve frekanslarını görüntüle
most_common_words = word_counter.most_common(10)
print("En Sık Kullanılan Kelimeler:")
for word, frequency in most_common_words:
    print(f"{word}: {frequency} kez")


En Sık Kullanılan Kelimeler:
the: 2878 kez
and: 2447 kez
of: 1177 kez
your: 1132 kez
for: 1074 kez
in: 979 kez
a: 945 kez
or: 896 kez
to: 781 kez
information: 743 kez


In [13]:

# N-gram analizi
ngram_vectorizer = CountVectorizer(ngram_range=(2, 2))  # İki kelime içeren n-gram'lar
ngram_counts = ngram_vectorizer.fit_transform(texts)
ngram_frequencies = ngram_counts.sum(axis=0).tolist()[0]
ngram_features = ngram_vectorizer.get_feature_names_out()



In [14]:

# En sık kullanılan n-gram'ları görüntüle
most_common_ngrams = list(zip(ngram_features, ngram_frequencies))
most_common_ngrams.sort(key=lambda x: x[1], reverse=True)
print("\nEn Sık Kullanılan 2-gram'lar:")
for ngram, frequency in most_common_ngrams[:10]:
    print(f"{ngram}: {frequency} kez")


En Sık Kullanılan 2-gram'lar:
the information: 627 kez
your own: 532 kez
reasons for: 485 kez
and include: 463 kez
for your: 463 kez
your answer: 463 kez
answer and: 462 kez
any relevant: 462 kez
examples from: 462 kez
from your: 462 kez


In [19]:
X = data['Essay']  # Metin sütunu
y = data['Overall']  # Puan sütunu

In [20]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [22]:
tfidf_vectorizer = TfidfVectorizer(max_features=5000)  # Özellik sayısını sınırla
X_train_tfidf = tfidf_vectorizer.fit_transform(X_train)
X_test_tfidf = tfidf_vectorizer.transform(X_test)

In [24]:
model = LinearRegression()
model.fit(X_train_tfidf, y_train)

In [25]:
y_pred = model.predict(X_test_tfidf)

In [27]:
mse = mean_squared_error(y_test, y_pred)
print("Ortalama Kare Hata:", mse)


Ortalama Kare Hata: 0.98117134539746


In [29]:
user_input = input("Bir metin giriniz: ")

# Önişleme adımları
user_input_tfidf = tfidf_vectorizer.transform([user_input])

# Modeli kullanarak tahmin
user_prediction = model.predict(user_input_tfidf)

print("Tahmin Edilen Overall Puan:", user_prediction)


Bir metin giriniz: Information about the thousands of visits from overseas to three different European natural places during 1987 and 2007 is provided in the given line chart. Overall, it can be seen that the number of visitors increased significantly in the three places compared to the initial year. Although, visits to Europeans lakes demostrated more changes over the 20 years than its counterparts. In more detail, the most steady growth was experienced by the visits to Europeans mountains. For example, from 1987 the number of visitors grew from 20,000 to almost the double 20 years later. Similarly, visits to the coast also rose after a slight fall in 1992, reaching almost twice as much since 1987, with 75,000. Those visiting Europeans lakes subtantially increased over the years from 10 thousand to a peak of 75 thousand in 2002. Despite falling for about 25 thousand in 2007, the visitis to this place remained higher compared to 1987, with 50,000 at the end of the period.
Tahmin Edilen