In this article, we are going to discuss how to search for single or multiple strings in a file and get all the matched lines along with their line numbers. This is especially helpful if we are given a large csv file without having prior knowledge about what the dataset looks like.

We intend to demonstrate 3 functionalities:
1) check if a string exists in a file
2) search for a string in file and get all lines containing the string along with line numbers
3) search for multiple strings in a file and get lines containing string along with line numbers

In [1]:
import os

In [2]:
os.chdir("C:\\Users\\GAO\\GAO_Jupyter_Notebook\\Datasets")

#### I. Check if a string exists in a file

To check if a given string exists in the file or not, we can create a function as below. The function iterates over each line in the file one by one and for each line check if it contains the given string or not. If the line contains the given string, then it returns True. Whereas if no line in the file contains the given string, then it returns False:

In [3]:
def check_single_string_in_file(file_name, string_to_search):
    """ This function checks if any line in the file contains a given string """
    # Open the file in read only mode
    """ Opening the file in the read-only mode and read all lines in the file one by one """
    with open(file_name, 'r') as read_obj:
        for line in read_obj:
            """For each line, check if line contains the string: """
            if string_to_search in line:
                print(string_to_search+' exists in the given file: '+file_name)
                return True
    print('The given string does not exist in the current file!')
    return False  

In [4]:
check_single_string_in_file('MB_Equity_withtypo.csv','Famale')

Famale exists in the given file: MB_Equity_withtypo.csv


True

In [5]:
check_single_string_in_file('MB_Equity_withtypo.csv','blah')

The given string does not exist in the current file!


False

#### II. Search for a string in file & get all lines containing the string along with line numbers

The function we created below does the following:
   1. Create an empty list of tuples.
   2. Open the file at the given path in read-only mode.
   3. Iterates over each line in the file one by one:
  
Within step 3, we do the following:
   1. For each line, check if it contains the given string or not. If the line contains the given string, we create a tuple of line number and  the line and adds that to a list of tuples.
   2. Then we return the list of tuples i.e., matched lines along with line numbers.

In [6]:
def search_string_in_file(file_name, string_to_search):
    """This function searches for the given string in file and return lines containing that string, along with line numbers"""
    line_number = 0
    list_of_results = []
    """ Opening the file in read only mode: """
    with open(file_name, 'r') as read_obj:
        for line in read_obj:
            """Checking if line contains the string per line: """
            line_number += 1
            if string_to_search in line:
                """If we find the string, then we add the line number & line as a tuple in the list """ 
                list_of_results.append((line_number, line.rstrip()))
 
    """ Returning list of tuples containing line numbers and lines where string is found """ 
    return list_of_results

In [7]:
search_string_in_file('MB_Equity_withtypo.csv','Female')

[(3, '20194,Female,White,High'), (5, '20163,Female,Non-White,High')]

#### III. Search for multiple strings in a file and get lines containing string along with line numbers

To search for multiple strings in a file, we can not use the above-created function because that will open and close the file for each string. Therefore, we have created a separate function, that will open a file once and then search for the lines in the file that contains any of the given string:

In [8]:
def search_multiple_strings_in_file(file_name, list_of_strings):
    """This function gets lines from the file along with line numbers, which contains any string from the list"""
    """We expect the second argument to be a list, within which each element is a string"""
    line_number = 0
    list_of_results = []
    with open(file_name, 'r') as read_obj:
        for line in read_obj:
            line_number += 1
            """For each line, we check if line contains any string from the list of strings: """
            for string_to_search in list_of_strings:
                if string_to_search in line:
                    """ If any string is found in the current line, we append that line along with line number in list: """
                    list_of_results.append((string_to_search, line_number, line.rstrip()))
 
    """Return list of tuples containing matched string, line numbers and lines where string is found"""
    return list_of_results

In [9]:
search_multiple_strings_in_file('MB_Equity_withtypo.csv', ['Famale','840816571'])

[('Famale', 2, '20184,Famale,White,High'),
 ('Famale', 4, '20154,Famale,White,High')]

References:
   - https://thispointer.com/python-search-strings-in-a-file-and-get-line-numbers-of-lines-containing-the-string/