# iPhone Product Search

Reading in the meta data for cell phones and doing a simple contains search for the search iphone in the title and description. The first step is to look for products that we know are 100% compatible with a specific model of the iPhone. Based on the data, we are looking into iPhone 7, iPhone 6/6S, iPhone 5/5S, and iPhone 4/4S. We note that there may be discrepencies in our results.

## Query Product
The code below will read in the document gzip file, iterate through each document and check the title or description of the item to see if it contains the query string "iphone". If the item does, append the entire item to an array and return it. Overall, time complexity is O(n) where n is the number of documents in the gzip file.

In [11]:
import gzip

META_CELLPHONE = 'Datasets/meta_Cell_Phones_and_Accessories.json.gz'

def query_product(file_name, query, category):
    g = gzip.open(file_name, 'r')
    results = []
    for line in g:
        document = eval(line)
        if (category in document):
            document_category = document[category].lower()
            if (document_category.find(query.lower()) != -1):
                results.append(document)
    return results

## Filter Results
The code below will take the results from ```query_product``` and iterate through all the items inside and only return the items with a title or description that contains the query parameter. This function can be used recursively. Overall, time complexity is O(n) where n is the number of documents returned from ```query_product```.

In [12]:
def filter_results(results, query, category):
    new_results = []
    for result in results:
        document_category = result[category].lower()
        if (document_category.find(query.lower()) != -1):
            new_results.append(result)
    return new_results

def filter_multiple(results, queries, category):
    new_results = []
    for result in results:
        print(result)
        document_category = result[category].lower()
        should_add = False
        for query in queries:
            if (document_category.find(query.lower()) != -1):
                should_add = True
        if (should_add):
            new_results.append(result)
    return new_results

## Removing Duplicates
Removing duplicates for the items that are in both title and description.

In [13]:
def remove_dups(iphone_title_results, iphone_description_results):
    iphone_results = []
    for i in range(0, len(iphone_title_results)):
        iphone_results.append(iphone_title_results[i])
    for i in range(0, len(iphone_description_results)):
        if iphone_description_results[i] in iphone_title_results:  
            continue
        else:
            iphone_results.append(iphone_description_results[i])
        return iphone_results

## Search for iPhone

Utilizing the query function to search for all products that contain the word iPhone in their title and description. Then printing out the first 10 items (title and description) just to show functionality.

In [4]:
iphone_title_results = query_product(META_CELLPHONE, 'iPhone', 'title')
iphone_description_results = query_product(META_CELLPHONE, 'iPhone', 'description')
iphone_results = remove_dups(iphone_title_results, iphone_description_results)
print('Total iPhone title results: {}'.format(len(iphone_title_results)))
for i in range(0,10):
    print('{}. {}'.format(i+1,iphone_title_results[i]['title']))
print('Total iPhone description results: {}'.format(len(iphone_description_results)))
for i in range(0,10):
    print('{}. {}'.format(i+1, iphone_description_results[i]['description']))

## Filtering for specific type of iPhone within iPhone Results

Here we will create a function that will filter all of the iPhone results for a specific number. For example, this will take care of "iPhone 6,7" when looking for an iPhone 7.

In [14]:
def filter_specific_iphone(iphone_num, iphone_results, specific_iphone_results):
    non_specific_iphone_results = []
    for i in range(0, len(iphone_results)):
        if iphone_results[i] in specific_iphone_results:
            continue
        else:
            non_specific_iphone_results.append(iphone_results[i])
    multiple_specific_iphone_title_results = filter_results(non_specific_iphone_results, iphone_num, 'title')
    multiple_specific_iphone_description_results = filter_results(non_specific_iphone_results, iphone_num, 'description')
    multiple_specific_iphone_results = remove_dups(multiple_specific_iphone_title_results, multiple_specific_iphone_description_results)
    return multiple_specific_iphone_results

## Filtering for iPhone 7

Utilizing the filter function to limit the results down from iPhone to iPhone 7. Based on the results, it is apparent that iPhone 7 is not in the data set and the data comes from before the existance of the iPhone 7. To test this even further, we should also return the results from iPhone that contain 7. For example: "iPhone 6, 7" -> this should be filtered as one of the results. We will print out the title and description for easy-to-read print statements.

In [15]:
def filter_iphone_7(iphone_results):
    iphone_7_title_results = filter_multiple(iphone_results, ['iPhone 7', 'iPhone7'], 'title')
    iphone_7_description_results = filter_multiple(iphone_results, ['iPhone 7', 'iPhone7'], 'description')
    iphone_7_results = remove_dups(iphone_7_title_results, iphone_7_description_results)
    multiple_iphone_7_results = filter_specific_iphone('7', iphone_results, iphone_7_results)
    for i in range(0,len(multiple_iphone_7_results)):
        iphone_7_results.append(multiple_iphone_7_results[i])
    return iphone_7_results

In [17]:
iphone_7_results = filter_iphone_7(iphone_results)
print('Total: {}'.format(len(iphone_7_results)))
for i in range(0,len(iphone_7_results)):
    print('{}. {}'.format(i+1,iphone_7_results[i]['title']))
    print('{}. {}'.format(i+1, iphone_7_results[i]['description']))

## Filtering for iPhone 6/6S
Utilizing the filter function to limit the results down from iPhone to iPhone 6, and iPhone 6S. We should also return the results from iPhone that contain 6S and 6. For example: "iPhone 5, 6, 6S" -> this should be filtered as one of the results.

In [18]:
def filter_iphone_6_6S(iphone_results):
    iphone_6_6S_title_results = filter_multiple(iphone_results, ['iPhone 6', 'iPhone 6S', 'iPhone6', 'iPhone6S'], 'title')
    iphone_6_6S_description_results = filter_multiple(iphone_results, ['iPhone 6', 'iPhone 6S', 'iPhone6', 'iPhone6S'], 'description')
    iphone_6_6S_results = remove_dups(iphone_6_6S_title_results, iphone_6_6S_description_results)
    # This should take care of both 6 and 6S filtering.
    multiple_iphone_6_6S_results = filter_specific_iphone('6', iphone_results, iphone_6_6S_results)
    for i in range(0,len(multiple_iphone_6_6S_results)):
        iphone_6_6S_results.append(multiple_iphone_6_6S_results[i])
    return iphone_6_6S_results

In [19]:
iphone_6_6S_results = filter_iphone_6_6S(iphone_results)
print('Total: {}'.format(len(iphone_6_6S_results)))
for i in range(0,10):
    print('{}. {}'.format(i+1,iphone_6_6S_results[i]['title']))
    print('{}. {}'.format(i+1,iphone_6_6S_results[i]['description']))

## Filtering for iPhone 5/5S
Utilizing the filter function to limit the results down from iPhone to iPhone 5, and iPhone 5S. We should also return the results from iPhone that contain 5S and 5. For example: "iPhone 6, 5S" -> this should be filtered as one of the results.

In [20]:
def filter_iphone_5_5S(iphone_results):
    iphone_5_5S_title_results = filter_multiple(iphone_results, ['iPhone 5', 'iPhone 5S', 'iPhone5', 'iPhone5S'], 'title')
    iphone_5_5S_description_results = filter_multiple(iphone_results, ['iPhone 5', 'iPhone 5S', 'iPhone5', 'iPhone5S'], 'description')
    iphone_5_5S_results = remove_dups(iphone_5_5S_title_results, iphone_5_5S_description_results)
    # This should take care of both 5 and 5S filtering.
    multiple_iphone_5_5S_results = filter_specific_iphone('5', iphone_results, iphone_5_5S_results)
    for i in range(0,len(multiple_iphone_5_5S_results)):
        iphone_5_5S_results.append(multiple_iphone_5_5S_results[i])
    return iphone_5_5S_results

In [21]:
iphone_5_5S_results = filter_iphone_5_5S(iphone_results)
print('Total: {}'.format(len(iphone_5_5S_results)))
for i in range(0,10):
    print('{}. {}'.format(i+1,iphone_5_5S_results[i]['title']))
    print('{}. {}'.format(i+1,iphone_5_5S_results[i]['description']))

## Filtering for iPhone 4/4S

Utilizing the filter function to limit the results down from iPhone to iPhone 4 and iPhone 4S. We should also return the results from iPhone that contain 4S and 4. For example: "iPhone 6, 5, 4" -> this should be filtered as one of the results.

In [22]:
def filter_iphone_4_4S(iphone_results):
    iphone_4_4S_title_results = filter_multiple(iphone_results, ['iPhone 4', 'iPhone 4S', 'iPhone4', 'iPhone4S'], 'title')
    iphone_4_4S_description_results = filter_multiple(iphone_results, ['iPhone 4', 'iPhone 4S', 'iPhone4', 'iPhone4S'], 'description')
    iphone_4_4S_results = remove_dups(iphone_4_4S_title_results, iphone_4_4S_description_results)
    # This should take care of both 4 and 4S filtering.
    multiple_iphone_4_4S_results = filter_specific_iphone('4', iphone_results, iphone_4_4S_results)
    for i in range(0,len(multiple_iphone_4_4S_results)):
        iphone_4_4S_results.append(multiple_iphone_4_4S_results[i])
    return iphone_4_4S_results

In [23]:
iphone_4_4S_results = filter_iphone_4_4S(iphone_results)
print('Total: {}'.format(len(iphone_4_4S_results)))
for i in range(0,10):
    print('{}. {}'.format(i+1,iphone_4_4S_results[i]['title']))
    print('{}. {}'.format(i+1,iphone_4_4S_results[i]['description']))