<a href="https://colab.research.google.com/github/mertalicagci/ai-log-question-answer-system/blob/main/logtraficc24.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
!pip install faiss-cpu
import pandas as pd
import re
from sklearn.feature_extraction.text import TfidfVectorizer
import faiss
from transformers import T5Tokenizer, T5ForConditionalGeneration
import time

# Log verileri
log_verileri = """
192.168.1.1 - - [10/Aug/2024:14:55:36 +0000] "GET /index.html HTTP/1.1" 200 1234
192.168.1.2 - - [10/Aug/2024:14:56:02 +0000] "POST /login HTTP/1.1" 200 567
192.168.1.3 - - [10/Aug/2024:14:57:10 +0000] "GET /about.html HTTP/1.1" 404 0
192.168.1.4 - - [10/Aug/2024:14:57:50 +0000] "GET /contact.html HTTP/1.1" 200 234
"""

# Log verilerini satırlara ayır
log_satirlari = log_verileri.strip().split('\n')

# RegEx desenini tek satırda tanımla
log_pattern = r'(\S+) - - \[.*?\] "(.*?)" (\d{3}) (\d+)'
log_girisleri = []

for satir in log_satirlari:
    eslesen = re.match(log_pattern, satir)
    if eslesen:
        ip_adresi, istek, durum_kodu, boyut = eslesen.groups()
        log_girisleri.append([ip_adresi, istek, int(durum_kodu), int(boyut)])

# DataFrame oluştur
log_df = pd.DataFrame(log_girisleri, columns=['IP_Adresi', 'Istek', 'Durum_Kodu', 'Boyut'])

# Veri temizleme
log_df = log_df.dropna()
log_df = log_df[log_df['IP_Adresi'].str.match(r'\d+\.\d+\.\d+\.\d+')]
log_df['Boyut'] = log_df['Boyut'].astype(int)

# 'Istek' sütununu vektörize et
vektorleyici = TfidfVectorizer()
tfidf_matrix = vektorleyici.fit_transform(log_df['Istek'])

# FAISS için vektörleri oluştur
dimension = tfidf_matrix.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(tfidf_matrix.toarray())

# Modeli yükle
tokenizer = T5Tokenizer.from_pretrained('t5-base')
model = T5ForConditionalGeneration.from_pretrained('t5-base')

# Sorgu yap
sorgu = "GET /index.html HTTP/1.1"
sorgu_tfidf = vektorleyici.transform([sorgu])
sorgu_vector = sorgu_tfidf.toarray()
D, I = index.search(sorgu_vector, k=1)

# Yanıt üret
input_ids = tokenizer.encode(sorgu, return_tensors='pt')
outputs = model.generate(input_ids)
yanit = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Yanıtı yazdır
print(f'Question: {sorgu}')
print(f'Answer: {yanit}')




Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Question: What pages returned a 200 status?
Answer:
Pages that returned a 200 status: GET /index.html HTTP/1.1, POST /login HTTP/1.1, GET /contact.html HTTP/1.1
Getirilen Loglar:
  IP_Adresi                      Istek  Durum_Kodu  Boyut
192.168.1.1   GET /index.html HTTP/1.1         200   1234
192.168.1.2       POST /login HTTP/1.1         200    567
192.168.1.3   GET /about.html HTTP/1.1         404      0
192.168.1.4 GET /contact.html HTTP/1.1         200    234
Response Time: 0.0125 seconds

Question: Which IP address accessed /login?
Answer:
The IP address that accessed the /login page is: 192.168.1.2
Getirilen Loglar:
  IP_Adresi                      Istek  Durum_Kodu  Boyut
192.168.1.2       POST /login HTTP/1.1         200    567
192.168.1.1   GET /index.html HTTP/1.1         200   1234
192.168.1.4 GET /contact.html HTTP/1.1         200    234
192.168.1.3   GET /about.html HTTP/1.1         404      0
Response Time: 0.0124 seconds

Question: Which pages were not found?
Answer:
Pa