# INFO 4271 - Group Project

Issued: June 11, 2024

Due: July 22, 2024

Please submit a link to your code base (ideally with a branch that does not change anymore after the submission deadline) and your 4-page report via email to carsten.eickhoff@uni-tuebingen.de by the due date. One submission per team.

---

# 1. Web Crawling & Indexing
Crawl the web to discover **English content related to Tübingen**. The crawled content should be stored locally. If interrupted, your crawler should be able to re-start and pick up the crawling process at any time.

In [59]:
#### Imports ####
import requests
from boilerpy3 import extractors
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk import pos_tag
import re
from trafilatura import fetch_url, extract
import contractions
from nltk.stem import WordNetLemmatizer
from duplicateCheck import check_simhash,computeHash
import math
import numpy as np
from sklearn.cluster import KMeans
import json
import os
from urllib.parse import urljoin
from urllib.parse import urlparse
from urllib.robotparser import RobotFileParser as rp
from bs4 import BeautifulSoup as bs
import html5lib
import requests
import stopit
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

In [None]:
#### preprocesses a term t 
#returns preprocessed t or False in case term is uninformative
def preprocess(t, tag):
 #get nltk stopwords english and german! (often also german stopwords still contained)
 stop_words = set(stopwords.words('english')).union(set(stopwords.words('german')))

 #initialize lemmatizer
 lemmatizer = WordNetLemmatizer()

 #convert to lowercase
 t = t.lower()

 #is url (starting with http(s) or www.)
 if re.compile(r'https?://\S+|www\.\S+').match(t):
    return False
      
 # special character,punctuation -> continue in this case 
 if re.compile(r'[^a-zA-Z0-9äöüÄÖÜ\s]').match(t):
    return False
 
 #is stopword
 if t in stop_words:
    return False 
 
 #remove also special chars inside token
 t = re.sub(r'[^a-zA-Z0-9äöüÄÖÜ\s]', '', t)
      
 #convert pos tag for lemmatization
 ltag = tag[0].lower()
 if ltag in ['a', 'r', 'n', 'v']:
          t = lemmatizer.lemmatize(t, ltag)

 return t

: 

In [2]:
#### Indexing ####

#Add a document to the index. You need (at least) two parameters:
    #doc: The document to be indexed.
    #index: The location of the local index storing the discovered documents.
def index(doc, index):
 url = doc['url']
 indexPath = index

 #get index of index.json
 f = open(indexPath, encoding='utf-8')
 index = json.load(f)
 docID = len(index[1])
 print(docID)
 doc = doc['doc']


####  Text Preprocessing of the doc ###################

 #extract relevant text from raw html doc
 content = extract(doc)


###storing unprocessed text content ##### -> store as file at end if no duplicate
 docContents={}
 soup = bs(doc, 'html.parser')
 title = soup.find('title').text if soup.find('title') else 'No title'
 docContents['title'] = title
 meta_desc = soup.find('meta', attrs={'name': 'description'})

 ###if description available save description, else save first 40 words (or less if doc shorter) -> remove special characters from that ####
 if meta_desc:
     docContents['text'] = meta_desc['content'] 
 else:
    splittedText = content.split()
    if len(splittedText) >= 40:
     docContents['text'] = re.sub(r'[^a-zA-Z0-9äöüÄÖÜ\s.,?!-]', '', ' '.join(content.split()[:40]))
    else: 
     docContents['text'] = re.sub(r'[^a-zA-Z0-9äöüÄÖÜ\s.,?!-]', '', ' '.join(splittedText[:len(splittedText)]))

 #docContents['content'] = content
 # Find the first image
 first_image = soup.find('img')
 
 
 #convert to lowercase
 content = content.lower()

 #remove contractions:
 content = contractions.fix(content)
 

 #store words of the processed document
 processedDoc = []
 #tokenize, go through all words and do the steps for each at once
 for position, (t, tag) in enumerate(pos_tag(word_tokenize(content))):
    
      #break up hyphenated words:
       if '-' in t:
          t = t.split('-')
          firstTermPreprocessed = preprocess(t[0], tag)
          #if not uninformative
          if firstTermPreprocessed:
              processedDoc.append([firstTermPreprocessed, position])
          secondTermPreprocessed = preprocess(t[1], tag)
          if secondTermPreprocessed:
              processedDoc.append([secondTermPreprocessed, position])
       else:
           termPreprocessed = preprocess(t, tag)
           if termPreprocessed:
               processedDoc.append([termPreprocessed, position])

  ############# Near Duplicate checking ##############
  #near duplicate checking by means of the words in the processed document (processedDoc)

 docHash = computeHash([l[0] for l in processedDoc])
 # compare to other doc hashes
 if(check_simhash(docHash, index[2])):
     #break and dont index if is near duplicate
     return False
 #if no duplicate save hash and insert terms in inverted index
 index[2].append(docHash)



############ Build up inverted index #####################
 for term, position in processedDoc:
     
      #entry in inverted index is: [docID, [occurence1, occurence2, ...]]
      #add on thy fly to inverted index
      #if word already in index add doc idto the words list
      if(term in index[0].keys()):
           #check if doc id already in list of that word (must be at the end)
           if index[0][term][len(index[0][term])-1][0] == docID:
                #if so append position there
                index[0][term][len(index[0][term])-1][1].append(position)
           #else add new list [docID, [position of t],skip pointer, tfidfval] for that word
           else:
               index[0][term].append([docID, [position], None, None])
               
      #if word not yet in index add "t: [[docID, [get position of t], tfidf weight for t in d, skip pointer (None) , tfidfval]]" to dict
      else:
           index[0][term] = [[docID, [position], None, None]]

 length = len(processedDoc)

 #add url , cluster of document (None so far) and length of preprocessed doc to list index[1] after indexing the doc
 index[1].append([url, None, length])

 #write changed index object to json
 with open(indexPath, 'w', encoding='utf-8') as f:
     json.dump(index, f, ensure_ascii=False)
 
 #save doc in documents ordner: name <docID>.json
 with open(os.path.join(os.getcwd(),"documents", str(docID) + ".json"), 'w', encoding='utf-8') as f:
     json.dump(docContents, f, ensure_ascii=False)
 
 #save first image of page if found
 if first_image:
     img_url = first_image['src']
     # Convert relative URL to absolute URL
     absolute_image_url = urljoin(url, img_url)
     f_ext = os.path.splitext(absolute_image_url)[-1]
     image_fetched = requests.get(absolute_image_url)
     if image_fetched.status_code == 200:
       # Get the content of the image
        image_content = image_fetched.content
       
        #save img in pictures ordner: name <docID>
        with open(os.path.join(os.getcwd(),"pictures", str(docID)) + f_ext, 'wb') as f:
           f.write(image_content)

 return True
 
    
      

In [3]:
#### Clustering ####
#clusters the docs of the index and inserts the labels into the index (currently 30 clusters)
# cluster using kmeans clustering with tf-idf vectors
def cluster(index):
     print('start clustering')
     indexPath = index
     #get index of index.json
     f = open(indexPath, encoding='utf-8')
     index = json.load(f)
     #convert index to tf-idf vector representations for each document
     idx = index[0] 
     docs = index[1]
     #matrix to store vectors (rows: documents , cols: terms)
     tfIDFMatrix = np.zeros((len(docs), len(idx.keys())))
     print('buildMatrix')

     for t in range(len(idx.keys())):
         term = list(idx.keys())[t]
         for i in range(len(idx[list(idx.keys())[t]])):
             #term index: t, doc index: idx[t][i][0]

             docID = idx[term][i][0]
             
             #calculate tf (nr occurences of t in doc/ length of doc) * idf(log (#docs/ #docs containing t))
             # occurences of t in doc: len(idx[term][i][1])
             # length of doc: docs[docID][2]
             # # docs = len[docs]
             # #docs containing term = len(idx[term])
             tfValue = len(idx[term][i][1])/docs[docID][2]
             idfValue = math.log(len(docs)/len(idx[term]))
             tfIDFMatrix[docID][t] = tfValue * idfValue

             ## store idf value for that doc and term in index on the fly (for BM25 later on)
             index[0][term][i][3] = idfValue
     
     print('endBuildmatrix')

     #also calculate avg doc length and append to index
     docLengths = [l[2] for l in index[1]]
     avgLength = sum(docLengths)/len(docLengths)
     index[3] = avgLength
      
     #place skip pointers once after whole index is built up
     #### rearange skip pointers ###########
     #delete current skip pointers for that posting list
     for term in index[0].keys():
        # sqare |p| evenly spaced per posting list (p: length of posting list of term t)
        p = len(index[0][term])
        #just if posting list has at least length 4
        if(p >= 4):
            nrPointers = math.sqrt(p)
            spaceBetween = math.floor(p/nrPointers)
            #current index in postingslist
            i = 0
            while(i + spaceBetween < p):
                #set skip pointer [idx to jump to in postings list, docID at that index]
                index[0][term][i][2] = [i + spaceBetween, index[0][term][i + spaceBetween][0]]
                i += spaceBetween

     #store new  index with tf idf values 
     #write changed index object to json
     with open(indexPath, 'w', encoding='utf-8') as f:
      json.dump(index, f, ensure_ascii=False)
     # Initialize KMeans clustering
     print('clustering')
     num_clusters = 10  # Example: Number of clusters
     kmeans = KMeans(n_clusters=num_clusters, random_state=42)

     # Fit KMeans model to the TF-IDF matrix
     kmeans.fit(tfIDFMatrix)

     docLabels = kmeans.labels_
     #centroids of each cluster:
     centroids = kmeans.cluster_centers_
     #get term with highest tf-idf score per cluster centroid
     centroidTopics = []
     for centroid in centroids:
        centroidTopics.append(list(idx.keys())[np.argmax(centroid)])
     
     #insert labels in index
     for d in range(len(index[1])):
         index[1][d][1] = centroidTopics[docLabels[d]]
      
     print('endClustering')
        
      #write changed index object to json
     with open(indexPath, 'w', encoding='utf-8') as f:
      json.dump(index, f, ensure_ascii=False)
     
     return True

In [4]:
#### Crawling ####
#Crawl the web. You need (at least) two parameters:
#frontier: The frontier of known URLs to crawl. You will initially populate this with your seed set of URLs and later maintain all discovered (but not yet crawled) URLs here.
#index: The location of the local index storing the discovered documents. 
STORAGE_LOC = "index.json"
# frontier = ["https://tuebingenresearchcampus.com/en/tuebingen",
#             "https://tunewsinternational.com/category/news-in-english/",
#             "https://www.tuebingen.de/en/",
#             "https://uni-tuebingen.de/en/",
#             "https://www.germany.travel/en/cities-culture/tuebingen.html",
#             "https://www.iwm-tuebingen.de/www/en/index.html",
#              "https://kunsthalle-tuebingen.de/en/",
#              "https://www.opentable.com/food-near-me/stadt-tubingen-germany",
#              "https://historicgermany.travel/historic-germany/tubingen/"
#              "https://en.wikipedia.org/wiki/T%C3%BCbingen",
#               "https://en.wikipedia.org/wiki/University_of_T%C3%BCbingen",
#              ]
frontier = ["https://www.opentable.com/food-near-me/stadt-tubingen-germany",
"https://alma.uni-tuebingen.de/alma/rds?state=user&type=0&language=en",
"https://exchange.uni-tuebingen.de/owa/auth/logon.aspx?replaceCurrent=1&url=https%3a%2f%2fexchange.uni-tuebingen.de%2fowa",
"https://fit.uni-tuebingen.de/",
"https://uni-tuebingen.de/en/einrichtungen/personalvertretungen-beratung-beauftragte/betriebliches-gesundheitsmanagement/",
"https://uni-tuebingen.de/en/einrichtungen/personalvertretungen-beratung-beauftragte/datenschutzbeauftragter/",
"https://uni-tuebingen.de/en/einrichtungen/personalvertretungen-beratung-beauftragte/digital-transformation-lab/",
"http://alma.uni-tuebingen.de/alma/pages/cs/sys/portal/hisinoneStartPage.faces",
"https://alma.uni-tuebingen.de/alma/pages/cs/sys/portal/hisinoneStartPage.faces",
"https://epv-welt.uni-tuebingen.de/RestrictedPages/StartSearch.aspx",
"https://uni-tuebingen.de/en/facilities/administration/iv-student-affairs/student-administration/student-administration/",
"https://uni-tuebingen.de/en/facilities/university-library/",
"https://uni-tuebingen.de/en/international/study-in-tuebingen/advice-and-counseling-for-international-students/",
"https://uni-tuebingen.de/en/university/",
"https://uni-tuebingen.de/en/university/profile/",
"https://uni-tuebingen.de/en/university/profile/facts-and-figures/",
"https://uni-tuebingen.de/en/university/profile/values-and-visions/",
"https://uni-tuebingen.de/en/university/profile/awards-and-distinctions/",
"https://uni-tuebingen.de/en/universitaet/profil/freunde-und-foerderer/",
"https://en.wikipedia.org/wiki/Protestant_Reformation"]
links_seen = frontier.copy()
saved = []

def crawl(frontier, indexPath):
    
    #get  first document of frontier while frontier not empty
    while len(frontier) != 0:
        # with stopit.ThreadingTimeout(20) as context_manager:
            link = frontier.pop(0)
            try:
                base_link = urlparse(link).netloc
                
                # check if we are allowed to access the website
                robots_file_loc = "http://" + base_link + "/robots.txt"
                session = requests.Session()
                retry = Retry(total=5, backoff_factor=0.1, status_forcelist=[ 500, 502, 503, 504 ])
                adapter = HTTPAdapter(max_retries=retry)
                session.mount('https://', adapter)
                robots_file = session.get(robots_file_loc, headers={"User-Agent": "Mozilla/5.0 (X11; CrOS x86_64 12871.102.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.141 Safari/537.36"})
                session.close()
                if robots_file.ok:
                    robot_parser = rp()
                    robot_parser.set_url(robots_file_loc)
                    robot_parser.read()
                    if not robot_parser.can_fetch("*", link):
                        continue
                elif robots_file.status_code != 404:
                    continue
                
                session = requests.Session()
                retry = Retry(total=5, backoff_factor=0.1, status_forcelist=[ 500, 502, 503, 504 ])
                adapter = HTTPAdapter(max_retries=retry)
                session.mount('https://', adapter)
                document = session.get(link, headers={"User-Agent": "Mozilla/5.0 (X11; CrOS x86_64 12871.102.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.141 Safari/537.36"}).text
                session.close()
                html5_doc = html5lib.parse(document)
                soup = bs(document, "html.parser")

                # save and use document if it's not a duplicate
                for script in soup(["script", "style"]):
                    script.extract() 

                # check if document is relevant
                doc_lang = html5_doc.get("lang")
                if doc_lang == None:
                    doc_lang = html5_doc.get("xml:lang")
                if doc_lang == None:
                    continue
                relevant = soup.find("body").text.find("Tübingen") > -1 # type: ignore
                if (doc_lang is not None and doc_lang.count("en") > 0 and relevant): # type: ignore
                    # process doc to right format and index
                    doc = {"url": link, "doc": document }
                    # TODO: index document
                    if index(doc, indexPath) == False:
                        print("Duplicate")
                        continue
                    # get all links from document and save to frontier if not seen yet
                    for a in soup.find_all('a'):
                        if (a.get('href')):
                            l = a.get('href')
                            if l.startswith('#'):	
                                continue
                            if urlparse(l).netloc == '':
                                l = base_link + l
                            if urlparse(l).scheme == '':
                                l = urlparse(link).scheme + '://' + l
                            if l not in links_seen:
                                frontier.append(l)
                                links_seen.append(l)
            except Exception as err:
                # if context_manager.state == context_manager.TIMED_OUT:
                #     print("Timed out at link: ", link)
                # else:
                    print("Exception occured at link: ", link, ". Description: ", err)



In [5]:
#### Execution (crawling, indexing, clustering) ####
## delete content of documents and pictures folder if there is some
# so the process can be started again without resetting everything by hand
docs = [os.path.join(os.getcwd(), "documents", f) for f in os.listdir(os.path.join(os.getcwd(), "documents"))]
for f in docs:
    os.remove(f)

images = [os.path.join(os.getcwd(), "pictures", f) for f in os.listdir(os.path.join(os.getcwd(), "pictures"))]
for i in images:
    os.remove(i)

#write [{}, [], [], None] in index.json 
json_data = json.dumps([{}, [], [], None])
with open(os.path.join(os.getcwd(), "index.json"), 'w') as json_file:
    json_file.write(json_data)

### crawl ####
    crawl(frontier, STORAGE_LOC)
#after crawling cluster and add remaining infos to index
cluster(STORAGE_LOC)


NameError: name 'os' is not defined

# 2. Query Processing 
Process a textual query and return the 100 most relevant documents from your index. Please incorporate **at least one retrieval model innovation** that goes beyond BM25 or TF-IDF. Please allow for queries to be entered either individually in an interactive user interface (see also #3 below), or via a batch file containing multiple queries at once. The batch file will be formatted to have one query per line, listing the query number, and query text as tab-separated entries. An example of the batch file for the first two queries looks like this:

```
1   tübingen attractions
2   food and drinks
```

In [60]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk import pos_tag
import re

# Process query
# remove stop words, lemmatize, etc.
def prepare_query(query):
	"""
	Function takes the query and processes it for further calculations the same way as it is done for documents for indexing.
	Function removes stop words and dublicates, tokenizes and lematizes the query  
	Args:
		query: our initial query given by the user

	Returns: 
		query_tokenized: list of the processed query items

	"""
	query_tokenized= []
	stop_words = set(stopwords.words('english'))
	lemmatizer = WordNetLemmatizer()
	query_lower = query.lower()
 	
	for term, tag in pos_tag(word_tokenize(query_lower)):
		if re.compile(r'[^a-zA-Z0-9äöüÄÖÜ\s]').match(term):
			continue
		if term in stop_words:
			continue

		# split words with "-"
		if '-' in term:
			t = term.split('-')
			for i, tag in pos_tag(t):
				ltag = tag[0].lower()
				if ltag in ['a', 'r', 'n', 'v']:
					i = lemmatizer.lemmatize(i, ltag)	
					if i not in query_tokenized:
						query_tokenized.append(i)
		else: # words without "-"
			ltag = tag[0].lower()
			if ltag in ['a', 'r', 'n', 'v']:
				term = lemmatizer.lemmatize(term, ltag)	
				if term not in query_tokenized:
					query_tokenized.append(term)
	return query_tokenized

#get all the documents that include all the queryterms from our index 
def get_documents_from_index(query_tokenized, index):
	"""
	Function takes the query and the index, searches for documents in index which contain all the query-terms- 
	Args:
		query_tokenized: our processed query 
		index: index containing all terms and matching documents

	Returns: 
		documents: list of all documents matching our query

	"""
	documents = []
	first_document = True
	second_document = False

	for term in query_tokenized:
		# first term in query --> get all documents amtching this term from index
		if term in index.keys() and first_document:   		
			first_document = False	
			documents = index[term]
			second_document = True
		# check second term. get documents from first iteration which have this term as well. use skip pointers for both lists 
		elif term in index.keys() and second_document: 
			second_document = False
			matches = []
			new_term_list = index[term]
			i = 0
			j = 0
			while i < len(documents) and j < len(new_term_list):  
				if documents[i][0] == new_term_list[j][0]:
					matches.append(documents[i])							# append entry including skip pointers for future iterations 
					i += 1
					j += 1
				elif documents[i][0] < new_term_list[j][0]:                 # A < B
					if documents[i][2]:                                     # If there is a skip pointer
						if documents[i][2][1] <= new_term_list[j][0]:       # Take it if it does not carry you beyond the id pointed to by B
							i = documents[i][2][0]
						else:                                       		# Otherwise increase the pointer by 1
							i += 1
					else:
						i += 1
				elif new_term_list[j][2]:                                   # If there is a skip pointer
					if new_term_list[j][2][1] <= documents[i][0]:           # Take it if it does not carry you beyond the id pointed to by A
						j = new_term_list[j][2][0]
					else:
						j += 1                                     			# Otherwise increase the pointer by 1
				else:
					j += 1 
			documents = matches		
			
		# check current term. get documents from previous iteration which have this term as well. use skip pointers only for list in index.
		elif term in index.keys():
			matches = []
			new_term_list = index[term]
			i = 0
			j = 0
			while i < len(documents) and j < len(new_term_list):  
				if documents[i][0] == new_term_list[j][0]:
					matches.append(documents[i])							# append entry including skip pointers for future iterations 
					i += 1
					j += 1
				elif documents[i][0] < new_term_list[j][0]:    				# A < B
					i += 1													# This time no skip pointers, because they aren't correct anymore

				elif new_term_list[j][2]:                                   # If there is a skip pointer
					if new_term_list[j][2][1] <= documents[i][0]:           # Take it if it does not carry you beyond the id pointed to by A
						j = new_term_list[j][2][0]
					else:
						j += 1                                     			# Otherwise increase the pointer by 1
				else:
					j += 1 
			documents = matches		
		else:
			return []														# query term wasn't found, no document can satisfy query

		print('dokumente währen vergleich:',documents)	

	# get rid of skip pointers:
	tmp = []
	for element in documents:
		tmp.append(element[0])
	documents = tmp

	return documents

# calculate bm25 score for each document
def bm25(document, query, index, k, b, avg_length, document_length, doc_num) :
	"""
	Function calculates the bm25 score for a document-query pair.  
	Args:
		document: a document matching our query
		query: our initial query given by the user
		index: index containing all terms and matching documents
		k: parameter for optimization
		b: parameter for optimization
		avg_length: average length of all documents in our index
		document_length: length of the document
		doc_num: total number of documents in our index

	Returns: 
		score: bm25 score for our document-query pair

	"""
	score = 0
	
	#calculate score:
	for term in query:
		tf = 0
		idf = 0
		# get tf from index
		for doc in index[term]:
			if doc[0] == document:
				tf = len(doc[1])

		#calculate idf:
		idf = math.log(doc_num/len(index[term]))

		numerator = tf *(k+1)
		denominator = tf + (k*(1 - b + (b* (document_length/avg_length))))
		#add score of query-term to score:
		score += idf * (numerator/denominator)
	#print('score', score)

	return score

def ranking_sort_helper(document_and_score):
	"""
	Function returning the score of a given document, used for sorting the ranked list.
	Args:
		document_and_score: list entry with document and score pair

	Returns: 
		:score value

	"""
	return document_and_score[1]

def format_helper(ranking, url_cluster):
	result = []
	for position, entry in enumerate(ranking):
		URL = url_cluster[entry[0]][0]
		cluster = url_cluster[entry[0]][1]
		result.append([position, URL, entry[1], cluster])
	return result

# Todo:
def diversify():
	pass

#Retrieve documents relevant to a query. You need (at least) two parameters:
    #query: The user's search query
    #index: The location of the local index storing the discovered documents.
def retrieve(query, index):
	"""
	Function takes a query and index. Ranks documents matching the query for relevance.  
	Args:
		query: our initial query given by the user
		index: index containing all terms and matching documents

	Returns: 
		result_list: list of ranked documents

	"""
    #TODO: Implement me
	ranking = []
	result_list = []


	# get processed query
	query_tokenized= prepare_query(query)		
	print('query tokenized:', query_tokenized,'\n')

	#get dictionary from indes
	document_index = index[0]
	#get avg document length:
	avg_length = index[3]
	# get number of documents in index:
	num_doc = len(index[1])

	# get all the documents, that include all the terms from our query
	documents = get_documents_from_index(query_tokenized, document_index)	
	print('erhaltene Dokumente', documents,'\n')
	
	# get bm25 value for each document:
	for document in documents:
		#get length of document
		document_length = index[1][document][2]
		#print('länge', document_length)

		# Todo: werte für k & b optimieren
		score = bm25(document, query_tokenized, document_index ,1.5, 0.75, avg_length, document_length, num_doc)
		ranking.append([document, score])


	# Sort ranking based on bm25 score, 
	ranking.sort(key=ranking_sort_helper, reverse=True)
	print('sorted ranking', ranking)


	# diversify: use hash or use tf-idf vektors?
	# Todo: implement diversify function
	hashes = index[2]
	#diversify(....)


	#get ranking into right format for srp visualization:
	result_list = format_helper(ranking[:99], index[1])

	return result_list

#[['D0',[Positions], [2, 'D2']], ['D1',[Positions], None], ['D2', N[Positions],one], ['D14', [Positions],None]]
test_index = ({'tübingen': [[0,[1,2,3],[2,2]],[1,[1,2,3],None],[2,[1,2,3],None], [14,[1,2,3],None]],
			 'henry': [[0,[1,2,3],None],[1,[1,2,3],None],[2,[1,2,3],None],[12,[1,2,3],None],[13,[1,2,3],None], [14,[1,2,3],None]],
			 'attraction': [[0,[1,2,3],[2,12]],[3,[1,2,3],[3,14]],[12,[1,2,3],None],[14,[1,2,3],None]]},
			 [['URL0'],['URL1'],['URL2']],
			 ['Hash1','Hash1'],
			 1234)

with open('index.json', encoding='utf-8') as file:
    index = json.load(file)

retrieve('tübingen location', index)

query tokenized: ['tübingen', 'location'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 56, 92, 110, 118, 124, 148, 196, 235, 255, 

[[0,
  'https://uni-tuebingen.de/en/universitaet/campusleben/veranstaltungen/veranstaltungskalender/',
  4.626944123897574,
  'research'],
 [1,
  'https://tuebingenresearchcampus.com/en/campus/tuebingen-research-campus',
  4.259755383745551,
  'research'],
 [2,
  'https://uni-tuebingen.de/en/international/research/',
  4.220768241924447,
  'research'],
 [3,
  'https://en.wikipedia.org/wiki/University_Library_of_T%C3%BCbingen',
  4.177605771069259,
  'research'],
 [4,
  'https://uni-tuebingen.de/en/staff/use-of-rooms/',
  4.062648457502892,
  'research'],
 [5,
  'https://uni-tuebingen.de/en/international/university/profile/',
  4.019310667521803,
  'research'],
 [6,
  'https://tuebingenresearchcampus.com/en/research-in-tuebingen',
  3.97797687501452,
  'research'],
 [7,
  'https://uni-tuebingen.de/en/facilities/administration/staff-units/public-relations-department/marketing/conference-and-presentation-materials/',
  3.6037926396003597,
  'research'],
 [8,
  'https://tuebingenresearchca

# 3. Search Result Presentation
Once you have a result set, we want to return it to the searcher in two ways: a) in an interactive user interface. For this user interface, please think of **at least one innovation** that goes beyond the traditional 10-blue-links interface that most commercial search engines employ. b) as a text file used for batch performance evaluation. The text file should be formatted to produce one ranked result per line, listing the query number, rank position, document URL and relevance score as tab-separated entries. An example of the first three lines of such a text file looks like this:

```
1   1   https://www.tuebingen.de/en/3521.html   0.725
1   2   https://www.komoot.com/guide/355570/castles-in-tuebingen-district   0.671
1   3   https://www.unimuseum.uni-tuebingen.de/en/museum-at-hohentuebingen-castle   0.529
...
1   100 https://www.tuebingen.de/en/3536.html   0.178
2   1   https://www.tuebingen.de/en/3773.html   0.956
2   2   https://www.tuebingen.de/en/4456.html   0.797
...
```

In [6]:
#TODO: Implement an interactive user interface for part a of this exercise.
#app.py

#Produce a text file with 100 results per query in the format specified above.
def load_queries(query_file_path):
    queries = []
    with open(query_file_path, 'r', encoding='utf-8') as file:
        for line in file:
            parts = line.strip().split('\t')
            if len(parts) == 2:
                query_num = int(parts[0])
                query_desc = parts[1]
                queries.append((query_num, query_desc))
    return queries

def batch(query_file_path, output_file_path):
    queries = load_queries(query_file_path)
    
    with open(output_file_path, 'w', encoding='utf-8') as output_file:
        for query_num, query_desc in queries:
            results = retrieve(query_desc, index)
            for rank, result in enumerate(results, start=1):
                result_line = f"{query_num}\t{rank}\t{result[0]}\t{result[1]}"
                output_file.write(result_line + '\n')

query_file_path = 'queries.txt'
output_file_path = 'batch_results_Sysala_Hirsch_Wenninger_Moser.txt'
batch(query_file_path, output_file_path)

In [75]:
from flask import Flask, render_template, request, redirect, url_for
import json
import logging
import os
from collections import Counter

app = Flask(__name__)

# Logging konfigurieren
logging.basicConfig(level=logging.DEBUG)

def load_json_file(file_path):
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            return json.load(f)
    except Exception as e:
        logging.error(f"Error loading JSON from {file_path}: {e}")
    return {}

# Get the Image of each document
def find_image(index):
    supported_formats = ['.svg', '.png', '.jpg', '.jpeg']
    for fmt in supported_formats:
        img_path = f'static/pictures/{index}{fmt}'
        if os.path.exists(img_path):
            return img_path
    return 'static/pictures/default_picture.jpg'

# Path categories
categories = load_json_file('static/categories.json')
document_index = load_json_file('index.json')

# Get title and description
def get_info(index):
    json_path = f'static/documents/{index}.json'
    data = load_json_file(json_path)
    title = data.get('title', 'No title')
    description = data.get('description', 'No description')
    image_url = find_image(index)
    return title, description, image_url

@app.route('/')
def index():
    return render_template('index.html', categories=categories)

@app.route('/search', methods=['GET', 'POST'])
def search():
    query = request.args.get('query') if request.method == 'GET' else request.form['query']
    ranklist = retrieve(query, document_index)
    selected_filter = request.args.get('filter') if request.method == 'GET' else request.form.get('filter')
    filter_count = Counter([result[3] for result in ranklist])
    top_filters = [word for word, count in filter_count.most_common(5)]

    results = []
    for result in ranklist:
        if selected_filter and result[3] != selected_filter:
            continue
        idx = result[0]
        title, description, image_url = get_info(idx)
        results.append({'url': result[1], 'title': title, 'snippet': description, 'image': image_url})

    return render_template('results.html', query=query, results=results, top_filters=top_filters, categories=categories)


@app.route('/filter', methods=['POST'])
def filter():
    selected_filter = request.form['filter']
    query = request.form['query']
    return redirect(url_for('search', query=query, filter=selected_filter))

if __name__ == '__main__':
    app.run(debug=False)


 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://127.0.0.1:5000
INFO:werkzeug:[33mPress CTRL+C to quit[0m
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:20:37] "POST /search HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:20:37] "GET /static/styles.css HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:20:37] "GET /static/pictures/1.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:20:37] "GET /static/pictures/4.jpg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:20:37] "GET /static/pictures/3.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:20:37] "GET /static/pictures/0.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:20:37] "GET /static/pictures/2.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:20:37] "[33mGET /favicon.ico HTTP/1.1[0m" 404 -


index <class 'list'>
query <class 'str'>
query tokenized: ['food', 'drink'] 

dokumente währen vergleich: [[34, [147, 191], [5, 64], 3.100092288878234], [35, [307], None, 3.100092288878234], [48, [107, 126], None, 3.100092288878234], [50, [291, 555, 604, 654, 659], None, 3.100092288878234], [59, [606], None, 3.100092288878234], [64, [990], [10, 119], 3.100092288878234], [72, [1398], None, 3.100092288878234], [80, [518], None, 3.100092288878234], [95, [592], None, 3.100092288878234], [113, [335], None, 3.100092288878234], [119, [973], [15, 461], 3.100092288878234], [155, [2], None, 3.100092288878234], [289, [3016], None, 3.100092288878234], [447, [668, 1000], None, 3.100092288878234], [448, [446, 661, 686], None, 3.100092288878234], [461, [210, 221, 313], [20, 499], 3.100092288878234], [479, [1132, 1138], None, 3.100092288878234], [487, [2267], None, 3.100092288878234], [489, [6852, 10985, 10988, 15945, 21788, 21853, 22063], None, 3.100092288878234], [491, [7641], None, 3.10009228887823

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:00] "[32mPOST /filter HTTP/1.1[0m" 302 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:00] "GET /search?query=Food+and+Drinks&filter=research HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:00] "[36mGET /static/styles.css HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:00] "[36mGET /static/pictures/0.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:00] "[36mGET /static/pictures/1.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:00] "[36mGET /static/pictures/2.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:00] "[36mGET /static/pictures/3.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:00] "[36mGET /static/pictures/4.jpg HTTP/1.1[0m" 304 -


index <class 'list'>
query <class 'str'>
query tokenized: ['food', 'drink'] 

dokumente währen vergleich: [[34, [147, 191], [5, 64], 3.100092288878234], [35, [307], None, 3.100092288878234], [48, [107, 126], None, 3.100092288878234], [50, [291, 555, 604, 654, 659], None, 3.100092288878234], [59, [606], None, 3.100092288878234], [64, [990], [10, 119], 3.100092288878234], [72, [1398], None, 3.100092288878234], [80, [518], None, 3.100092288878234], [95, [592], None, 3.100092288878234], [113, [335], None, 3.100092288878234], [119, [973], [15, 461], 3.100092288878234], [155, [2], None, 3.100092288878234], [289, [3016], None, 3.100092288878234], [447, [668, 1000], None, 3.100092288878234], [448, [446, 661, 686], None, 3.100092288878234], [461, [210, 221, 313], [20, 499], 3.100092288878234], [479, [1132, 1138], None, 3.100092288878234], [487, [2267], None, 3.100092288878234], [489, [6852, 10985, 10988, 15945, 21788, 21853, 22063], None, 3.100092288878234], [491, [7641], None, 3.10009228887823

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:12] "POST /search HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:12] "[36mGET /static/styles.css HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:12] "GET /static/pictures/6.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:12] "GET /static/pictures/7.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:12] "GET /static/pictures/5.png HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:12] "GET /static/pictures/8.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:12] "GET /static/pictures/9.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:12] "GET /static/pictures/10.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:12] "GET /static/pictures/11.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:12] "GET /static/pictures/12.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:21:12] "GET 

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'university'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:33] "GET /static/pictures/18.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:33] "GET /static/pictures/19.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:33] "GET /static/pictures/20.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:33] "GET /static/pictures/21.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:33] "GET /static/pictures/22.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:33] "GET /static/pictures/24.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:33] "[36mGET /static/pictures/0.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:33] "GET /static/pictures/25.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:33] "GET /static/pictures/26.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:33] "GET /static/pictures/27.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'university'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:34] "[36mGET /static/pictures/6.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:34] "[36mGET /static/pictures/7.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:34] "[36mGET /static/pictures/8.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:34] "[36mGET /static/pictures/9.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:34] "[36mGET /static/pictures/10.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:34] "GET /static/pictures/11.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:34] "GET /static/pictures/12.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:34] "GET /static/pictures/13.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:34] "GET /static/pictures/14.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:34] "[36mGET /static/pictures/15.svg HTTP/1.1[

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'university'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:35] "[36mGET /static/pictures/8.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:35] "[36mGET /static/pictures/9.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:35] "[36mGET /static/pictures/10.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:35] "[36mGET /static/pictures/11.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:35] "[36mGET /static/pictures/12.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:35] "[36mGET /static/pictures/13.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:35] "[36mGET /static/pictures/14.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:35] "[36mGET /static/pictures/15.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:35] "[36mGET /static/pictures/16.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:35] "[36m

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'university'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:36] "[36mGET /static/pictures/0.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:36] "[36mGET /static/pictures/1.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:36] "[36mGET /static/pictures/2.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:36] "[36mGET /static/pictures/3.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:36] "[36mGET /static/pictures/4.jpg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:36] "[36mGET /static/pictures/5.png HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:36] "[36mGET /static/pictures/6.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:36] "[36mGET /static/pictures/7.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:36] "[36mGET /static/pictures/8.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:36] "[36mGET /st

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'university'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:37] "[36mGET /static/pictures/9.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:37] "[36mGET /static/pictures/10.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:37] "[36mGET /static/pictures/11.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:37] "[36mGET /static/pictures/12.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:37] "[36mGET /static/pictures/13.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:37] "[36mGET /static/pictures/14.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:37] "[36mGET /static/pictures/15.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:37] "[36mGET /static/pictures/16.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:37] "[36mGET /static/pictures/17.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:37] "[36

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'university'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:38] "GET /static/pictures/9.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:38] "GET /static/pictures/10.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:38] "GET /static/pictures/11.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:38] "GET /static/pictures/12.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:38] "GET /static/pictures/13.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:38] "GET /static/pictures/14.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:38] "GET /static/pictures/15.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:38] "GET /static/pictures/16.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:38] "GET /static/pictures/17.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:38] "GET /static/pictures/18.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'university'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:39] "[36mGET /static/pictures/11.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:39] "[36mGET /static/pictures/12.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:39] "[36mGET /static/pictures/13.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:39] "[36mGET /static/pictures/14.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:39] "[36mGET /static/pictures/15.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:39] "[36mGET /static/pictures/16.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:39] "[36mGET /static/pictures/18.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:39] "[36mGET /static/pictures/17.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:39] "[36mGET /static/pictures/19.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:39] "[3

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'university'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:40] "[36mGET /static/pictures/8.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:40] "[36mGET /static/pictures/9.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:40] "[36mGET /static/pictures/10.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:40] "[36mGET /static/pictures/11.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:40] "[36mGET /static/pictures/12.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:40] "[36mGET /static/pictures/13.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:40] "[36mGET /static/pictures/14.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:40] "[36mGET /static/pictures/15.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:40] "[36mGET /static/pictures/16.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:40] "[36m

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'university'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:41] "[36mGET /static/pictures/9.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:41] "[36mGET /static/pictures/10.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:41] "[36mGET /static/pictures/11.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:41] "[36mGET /static/pictures/12.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:41] "[36mGET /static/pictures/13.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:41] "[36mGET /static/pictures/14.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:41] "[36mGET /static/pictures/15.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:41] "[36mGET /static/pictures/16.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:41] "[36mGET /static/pictures/17.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:41] "[36

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'university'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:47] "[36mGET /static/pictures/3.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:47] "[36mGET /static/pictures/4.jpg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:47] "[36mGET /static/pictures/5.png HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:47] "[36mGET /static/pictures/6.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:47] "[36mGET /static/pictures/7.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:47] "[36mGET /static/pictures/8.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:47] "[36mGET /static/pictures/9.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:47] "[36mGET /static/pictures/10.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:47] "[36mGET /static/pictures/11.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:47] "[36mGET /

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'university'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:48] "[36mGET /static/pictures/8.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:48] "[36mGET /static/pictures/9.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:48] "[36mGET /static/pictures/10.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:48] "[36mGET /static/pictures/11.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:48] "[36mGET /static/pictures/12.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:48] "[36mGET /static/pictures/13.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:48] "[36mGET /static/pictures/14.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:48] "[36mGET /static/pictures/15.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:48] "[36mGET /static/pictures/16.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:48] "[36m

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'university'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:50] "[36mGET /static/pictures/10.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:50] "[36mGET /static/pictures/11.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:50] "[36mGET /static/pictures/12.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:50] "[36mGET /static/pictures/13.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:50] "[36mGET /static/pictures/14.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:50] "[36mGET /static/pictures/15.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:50] "[36mGET /static/pictures/16.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:50] "[36mGET /static/pictures/17.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:50] "[36mGET /static/pictures/18.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:50] "[3

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'university'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:51] "[36mGET /static/pictures/8.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:51] "[36mGET /static/pictures/9.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:51] "[36mGET /static/pictures/10.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:51] "[36mGET /static/pictures/11.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:51] "[36mGET /static/pictures/12.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:51] "[36mGET /static/pictures/13.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:51] "[36mGET /static/pictures/14.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:51] "[36mGET /static/pictures/15.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:51] "[36mGET /static/pictures/16.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:22:51] "[36m

index <class 'list'>
query <class 'str'>
query tokenized: ['aaa'] 

dokumente währen vergleich: [[483, [3066], None, 5.625820933186489], [484, [221], None, 5.625820933186489]]
erhaltene Dokumente [483, 484] 

sorted ranking [[484, 8.702995604120071], [483, 3.1724573711876456]]


INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:01] "POST /search HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:01] "[36mGET /static/styles.css HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:01] "[36mGET /static/pictures/0.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:01] "[36mGET /static/pictures/1.svg HTTP/1.1[0m" 304 -


index <class 'list'>
query <class 'str'>
query tokenized: ['aaa'] 

dokumente währen vergleich: [[483, [3066], None, 5.625820933186489], [484, [221], None, 5.625820933186489]]
erhaltene Dokumente [483, 484] 

sorted ranking [[484, 8.702995604120071], [483, 3.1724573711876456]]


INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:09] "POST /search HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:09] "[36mGET /static/styles.css HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:09] "[36mGET /static/pictures/0.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:09] "[36mGET /static/pictures/1.svg HTTP/1.1[0m" 304 -


index <class 'list'>
query <class 'str'>
query tokenized: ['aaa'] 

dokumente währen vergleich: [[483, [3066], None, 5.625820933186489], [484, [221], None, 5.625820933186489]]
erhaltene Dokumente [483, 484] 

sorted ranking [[484, 8.702995604120071], [483, 3.1724573711876456]]


INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:21] "POST /search HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:21] "[36mGET /static/styles.css HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:21] "[36mGET /static/pictures/0.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:21] "[36mGET /static/pictures/1.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:21] "[36mGET /static/pictures/2.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:21] "[36mGET /static/pictures/3.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:21] "[36mGET /static/pictures/4.jpg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:21] "[36mGET /static/pictures/5.png HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:21] "[36mGET /static/pictures/6.svg HTTP/1.1[0m" 304 -


index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'attraction'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:24] "[32mPOST /filter HTTP/1.1[0m" 302 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:24] "GET /search?query=tübingen+attractions&filter=research HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:24] "[36mGET /static/styles.css HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:24] "[36mGET /static/pictures/0.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:24] "[36mGET /static/pictures/1.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:24] "[36mGET /static/pictures/2.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:24] "[36mGET /static/pictures/3.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:24] "[36mGET /static/pictures/4.jpg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:24] "[36mGET /static/pictures/5.png HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:24] "[36mGET /s

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'attraction'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:25] "[32mPOST /filter HTTP/1.1[0m" 302 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:25] "GET /search?query=tübingen+attractions&filter=research HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:25] "[36mGET /static/styles.css HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:25] "[36mGET /static/pictures/0.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:25] "[36mGET /static/pictures/1.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:25] "[36mGET /static/pictures/2.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:25] "[36mGET /static/pictures/3.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:25] "[36mGET /static/pictures/4.jpg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:25] "[36mGET /static/pictures/5.png HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:25] "[36mGET /s

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'attraction'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[32mPOST /filter HTTP/1.1[0m" 302 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "GET /search?query=tübingen+attractions&filter=research HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/styles.css HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/0.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/1.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/2.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/3.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/4.jpg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/5.png HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /s

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'attraction'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[32mPOST /filter HTTP/1.1[0m" 302 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "GET /search?query=tübingen+attractions&filter=research HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/styles.css HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/0.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/1.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/2.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/3.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/4.jpg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/5.png HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /s

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'attraction'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[32mPOST /filter HTTP/1.1[0m" 302 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "GET /search?query=tübingen+attractions&filter=research HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/styles.css HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/0.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/1.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/2.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/3.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/4.jpg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /static/pictures/5.png HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:26] "[36mGET /s

index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'attraction'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:27] "POST /search HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:27] "[36mGET /static/styles.css HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:27] "[36mGET /static/pictures/0.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:27] "[36mGET /static/pictures/1.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:27] "[36mGET /static/pictures/2.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:27] "[36mGET /static/pictures/3.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:27] "[36mGET /static/pictures/4.jpg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:27] "[36mGET /static/pictures/5.png HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:23:27] "[36mGET /static/pictures/6.svg HTTP/1.1[0m" 304 -


index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'attraction'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:24:13] "POST /search HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:24:13] "GET /static/styles.css HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:24:13] "GET /static/pictures/0.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:24:13] "GET /static/pictures/1.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:24:13] "GET /static/pictures/2.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:24:13] "[36mGET /static/pictures/3.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:24:13] "[36mGET /static/pictures/4.jpg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:24:13] "[36mGET /static/pictures/5.png HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:24:13] "[36mGET /static/pictures/6.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:24:13] "[33mGET /favicon.ico HTTP/1.1[0m" 404 -


index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'attraction'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "POST /search HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[36mGET /static/styles.css HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[36mGET /static/pictures/7.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[36mGET /static/pictures/8.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[36mGET /static/pictures/9.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[36mGET /static/pictures/10.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[36mGET /static/pictures/11.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[36mGET /static/pictures/12.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[36mGET /static/pictures/13.svg HTTP/1.1[0m" 304 -


index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'university'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[36mGET /static/pictures/14.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[36mGET /static/pictures/15.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[36mGET /static/pictures/16.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[36mGET /static/pictures/17.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[36mGET /static/pictures/18.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[36mGET /static/pictures/19.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[36mGET /static/pictures/20.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[36mGET /static/pictures/21.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[36mGET /static/pictures/22.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:25:33] "[3

index <class 'list'>
query <class 'str'>
query tokenized: ['food', 'drink'] 

dokumente währen vergleich: [[34, [147, 191], [5, 64], 3.100092288878234], [35, [307], None, 3.100092288878234], [48, [107, 126], None, 3.100092288878234], [50, [291, 555, 604, 654, 659], None, 3.100092288878234], [59, [606], None, 3.100092288878234], [64, [990], [10, 119], 3.100092288878234], [72, [1398], None, 3.100092288878234], [80, [518], None, 3.100092288878234], [95, [592], None, 3.100092288878234], [113, [335], None, 3.100092288878234], [119, [973], [15, 461], 3.100092288878234], [155, [2], None, 3.100092288878234], [289, [3016], None, 3.100092288878234], [447, [668, 1000], None, 3.100092288878234], [448, [446, 661, 686], None, 3.100092288878234], [461, [210, 221, 313], [20, 499], 3.100092288878234], [479, [1132, 1138], None, 3.100092288878234], [487, [2267], None, 3.100092288878234], [489, [6852, 10985, 10988, 15945, 21788, 21853, 22063], None, 3.100092288878234], [491, [7641], None, 3.10009228887823

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:27:33] "POST /search HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:27:33] "[36mGET /static/styles.css HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:27:33] "GET /static/pictures/9.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:27:33] "GET /static/pictures/10.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:27:33] "GET /static/pictures/11.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:27:33] "GET /static/pictures/12.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:27:33] "[36mGET /static/pictures/0.svg HTTP/1.1[0m" 304 -


index <class 'list'>
query <class 'str'>
query tokenized: ['tübingen', 'university'] 

dokumente währen vergleich: [[0, [0, 34, 54, 61, 117, 126], [20, 20], 0.24823038574394468], [1, [45, 509, 994, 1043, 1093, 1121, 1150], None, 0.24823038574394468], [2, [2, 27], None, 0.24823038574394468], [3, [8, 19, 80, 112, 200, 283], None, 0.24823038574394468], [4, [239, 278], None, 0.24823038574394468], [5, [2, 6, 159, 167, 174, 193, 202, 214, 307, 421, 477, 527, 551, 593, 643, 673, 699, 721, 755, 963, 1031, 1120, 1209, 1284, 1341, 1380, 1400, 1424, 1438, 1467, 1493, 1595, 1632, 1666, 1729, 1936, 2018, 2477, 2547, 2575, 2633, 2693, 2749, 2766, 2790, 2812, 2970, 3093, 3168, 3214, 3314, 3388, 3403, 3645, 3708, 3770, 3809, 3905, 3967, 3990, 4143, 4150, 4239, 4261, 4271, 4299, 4314, 4330, 4356, 4366, 4398, 4401, 4438, 4463, 4477, 4490, 4503, 4509, 4525, 4543, 4559, 4571, 4587, 4605, 4621, 4642, 4793, 4814, 4832, 5002, 5007, 5019, 5031, 5082, 5098, 5114, 5164], None, 0.24823038574394468], [6, [0, 46, 

INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:27:33] "GET /static/pictures/13.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:27:33] "[36mGET /static/pictures/1.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:27:33] "[36mGET /static/pictures/2.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:27:33] "[36mGET /static/pictures/3.svg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:27:33] "GET /static/pictures/20.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:27:33] "GET /static/pictures/21.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:27:33] "[36mGET /static/pictures/4.jpg HTTP/1.1[0m" 304 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:27:33] "GET /static/pictures/22.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:27:33] "GET /static/pictures/23.svg HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [19/Jul/2024 15:27:33] "GET /static/pictures/24.svg HTTP/1.1" 200 -
INFO:wer

index <class 'list'>
query <class 'str'>
query tokenized: ['bliblablubber'] 

erhaltene Dokumente [] 

sorted ranking []


# 4. Performance Evaluation 
We will evaluate the performance of our search systems on the basis of five queries. Two of them are avilable to you now for engineering purposes:
- `tübingen attractions`
- `food and drinks`

The remaining three queries will be given to you during our final session on July 23rd. Please be prepared to run your systems and produce a single result file for all five queries live in class. That means you should aim for processing times of no more than ~1 minute per query. We will ask you to send carsten.eickhoff@uni-tuebingen.de that file.

# Grading
Your final projects will be graded along the following criteria:
- 25% Code correctness and quality (to be delivered on this sheet)
- 25% Report (4 pages, PDF, explanation and justification of your design choices)
- 25% System performance (based on how well your system performs on the 5 queries relative to the other teams in terms of nDCG)
- 15% Creativity and innovativeness of your approach (in particular with respect to your search system #2 and user interface #3 innovations)
- 10% Presentation quality and clarity

# Permissible libraries
You can use any general-puprose ML and NLP libraries such as scipy, numpy, scikit-learn, spacy, nltk, but please stay away from dedicated web crawling or search engine toolkits such as scrapy, whoosh, lucene, terrier, galago and the likes. Pretrained models are fine to use as part of your system, as long as they have not been built/trained for retrieval. 
