# Textual Search

This notebook aims to provide an interactive way of performing the textual search (real time).

* Author: Jessica Silva
* Date: 27-04-2022

In [11]:
import pandas as pd
from pathlib import Path
from src.data.read_dataset import DatasetReader

pd.set_option('display.max_colwidth', None)

## 1. Data

The dataset is a file (e.g. source.txt) containing the **source text** and the **search term**. The file must be inside the **./data/** directory and the extension must be .txt:
* **Source text**: lines of strings, with each line containing three words embedded in symbols, numbers and spaces.
* **Search term**: always on the last line of the file, and contains a single word.

In [12]:
# Defining the source file
SEARCH_FILE = "source.txt"

In [13]:
# Loading the data
df_search, search_term = DatasetReader(SEARCH_FILE).get_data()

In [14]:
# Source dataset
df_search

Unnamed: 0,text
0,Alice was beginning
1,to get very
2,tired of sitting
3,by her sister
4,on the bank
5,and of having
6,nothing to do


In [15]:
# Search term
search_term

'er'

## 2. Textual Search

Search the **Source text** for matches of the **Search term**, and output all the matches.

In [16]:
from src.features.textual_search import TextualSearch

In [17]:
# Performing a search
output_search = TextualSearch(df_search, search_term).search_term()

In [18]:
# Output
output_search = [print("[" + ' '.join(out) + "]") for out in output_search]

[to get very]
[by her sister]


## 3. Interactive Demo

Search performed in real time.

* **source**: One can add or remove new string lines.
* **term**: One can change the search term in real time
* **Output**: String lines that matches with the search term.

In [19]:
import utils
from ipywidgets import interact, fixed, Textarea

In [20]:
SOURCE = Textarea(value=str(df_search.text.values.tolist()), layout={'width': '90%', 'height': '100px'})
TERM = Textarea(value=search_term, layout={'width': '90%', 'height': '30px'})
_interact = interact(utils.search_interactive, source=SOURCE, term=TERM)

interactive(children=(Textarea(value="['Alice was beginning', 'to get very', 'tired of sitting', 'by her siste…