# Term Search in Documents

## Objective
The goal of this exercise is to develop a simple information retrieval system that allows the user to search for a specific term across a set of text documents. This will introduce you to the basics of text processing and searching algorithms in the context of information retrieval.

## Problem Description
You are provided with a set of text documents. Your task is to implement a search function that:
- Takes a user-inputted term as the query.
- Searches for this term across all the provided documents.
- Returns a list of documents where the term appears.

## Requirements

### Step 1: Preparing the Data
- **Load the Documents**: You will start by loading the text documents into your program. These documents can be in plain text format stored in a directory.
- **Read Each Document**: Implement a function to read each document and store its contents in a data structure of your choice (e.g., a list).

### Step 2: Implementing the Search
- **Input Query**: Implement a function to accept a query term from the user.
- **Search Function**: Create a function that:
  - Iterates through each document.
  - Checks if the query term appears in the document.
  - You may choose to implement case-insensitive search to improve user experience.
- **Return Results**: The function should return the names or identifiers of the documents where the term is found.

### Step 3: Displaying Results
- **Output the Results**: For each search query, output the results in a user-friendly format, listing the documents where the term was found, or a message indicating that the term does not appear in any document.

## Evaluation Criteria
- **Correctness**: The search function should accurately identify documents containing the term.
- **Efficiency**: While efficiency may not be critical for small datasets, consider the efficiency of your search algorithm.
- **Usability**: The interface for inputting search terms and viewing results should be clear and easy to use.

## Additional Challenges (Optional)
- **Enhance the search functionality**: Allow for more complex queries, such as phrases or multiple terms.
- **Improve the search with regular expressions**: Use regex for pattern matching to enhance the flexibility of the search.
- **Implement a simple ranking system**: Rank the documents based on the frequency of the term within each document.

This exercise will help you understand the fundamental mechanisms behind storing and retrieving data in the field of information retrieval. By the end of this task, you will have a basic prototype that mimics core functions of larger, more complex search engines.


## Development
- Fernando Cardenas

In [39]:
import os

In [40]:
# PATH LOCAL
directory = r'C:\Users\usuario\Fer-Pc\Escritorio\EPN\2024-A\SEPTIMO_SEMESTRE\RECUPERACION_DE_INFORMACION\ir24a\week01\data'

In [41]:
def load_documents(directory):
    documents = []
    for filename in os.listdir(directory):
        if filename.endswith('.txt'):
            path = os.path.join(directory, filename)
            with open(path, 'r', encoding='utf-8') as file:
                content = file.read()
                documents.append((filename, content))
    return documents

In [42]:
# Load the documents from the directory
documents = load_documents(directory)
# Show the number of loaded documents
print("Loaded {} documents.".format(len(documents)))

Loaded 98 documents.


In [43]:
def search_in_documents(documents, term):
    results = []
    for name, content in documents:
        if term.lower() in content.lower():
            results.append(name)
    return results

In [44]:
# Get the search term from the user
search_term = "hello"

# Perform the search in the loaded documents
search_results = search_in_documents(documents, search_term)

# Show the search results
if search_results:
    print("The term '{}' was found in the following documents:".format(search_term))
    for result in search_results:
        print("- {}".format(result))

The term 'hello' was found in the following documents:
- Adventures of Huckleberry Finn.txt
- Biographical Anecdotes of William Hogarth, With a Catalogue of His Works.txt
- Dubliners.txt
- Fan Fare May 1953.txt
- History of Tom Jones, a Foundling.txt
- Kentucky in American Letters.txt
- My Life ÔÇö Volume 1.txt
- Roget's Thesaurus of English Words and Phrases.txt
- Standard Selections.txt
- The Adventures of Tom Sawyer.txt
- The Brothers Karamazov.txt
- The Complete Works of William Shakespeare.txt
- The Count of Monte Cristo.txt
- The Great Gatsby.txt
- The Metamorphoses of Ovid.txt
- The Reign of Greed.txt
- The Souls of Black Folk.txt
- The Wonderful Wizard of Oz.txt
- Ulysses.txt
