TEXT PROCESSING


Processing the recognized text involves working with the output of the speech recognition step, which is the transcribed text obtained from the audio input. In Python, once you have the recognized text, you can perform various operations on it based on your specific requirements. Here are a few examples:

1. Text Analysis: Analyze the recognized text using natural language processing (NLP) techniques. You can use libraries such as NLTK (Natural Language Toolkit) or spaCy to perform tasks like part-of-speech tagging, named entity recognition, sentiment analysis, or topic extraction. These analyses can provide valuable insights and help you understand the content and context of the speech.

In [2]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('words')
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.tag import pos_tag
from nltk.chunk import ne_chunk


recognized_text = 'Edinburgh because well lots of buses and planes go high good place and I went to the eye boggling thing'

# Tokenize the recognized text into sentences and words
sentences = sent_tokenize(recognized_text)
words = word_tokenize(recognized_text)

# Perform part-of-speech tagging
pos_tags = pos_tag(words)

# Perform named entity recognition
named_entities = ne_chunk(pos_tags)


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\c21054458\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\c21054458\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package words to
[nltk_data]     C:\Users\c21054458\AppData\Roaming\nltk_data...
[nltk_data]   Package words is already up-to-date!


2. Entity Extraction: Extract specific information or entities from the recognized text. You can define patterns or use regular expressions to identify relevant entities such as names, dates, locations, or numbers. These extracted entities can then be used as parameters for your SPARQL query or for further processing.

In [3]:
import re

# Extract dates using regular expressions
date_pattern = r'\d{4}-\d{2}-\d{2}'
dates = re.findall(date_pattern, recognized_text)

# Extract names using a pattern
name_pattern = r'My name is (\w+)'
match = re.search(name_pattern, recognized_text)
if match:
    name = match.group(1)


Query Generation: Generate a SPARQL query based on the recognized text and the specific task or intent. You can use string manipulation techniques to construct a valid SPARQL query, incorporating the extracted entities or applying predefined templates.

In [10]:
# Construct a SPARQL query using recognized entities
query_template = "SELECT ?property ?value WHERE {{ <{0}> ?property ?value }}"
entity = "http://example.org/entity"
sparql_query = query_template.format(entity)


In [11]:
sparql_query

'SELECT ?property ?value WHERE { <http://example.org/entity> ?property ?value }'