### Importing a few useful functions

In [1]:
from useful_functions import _upper, _lower, _unique

```



```

## Functions for comparing two lists of strings 

**For Loops**

In [2]:
def str_contains_A(listOne, listTwo, unique=True, case=True):
    hits = []
    for i in listOne:
        for j in listTwo:
            if case:
                if i in j:
                    hits.append(j)           
            else:
                if i.lower() in j.lower():
                    hits.append(j)
    if unique:
        hits = _unique(hits)
    return hits

**List Comprehensions**

In [3]:
def str_contains_B(listOne, listTwo, unique=True, case=True):
    hits = []
    if case is True:
        contains = lambda One,Two: [j for i in One for j in Two if i in j]
    else:
        contains = lambda One,Two: [j for i in One for j in Two if i.lower() in j.lower()]
    
    hits = contains(listOne, listTwo)
    hits = (_unique(hits) if unique else hits)
    return hits

**Introducing Exclusivity**

Sometimes we find ourselves wanting only the results of a list to be returned that contain *all* the words from another list. 
For an example, we use a subset of matplotlib's rcParams and we wish to have *only* the results returned having *both* 'axes' and 'label', not 'axes' or 'label':
    
    list1 = ['axes', 'label']
    list2 = ['axes.grid', 'axes.grid.axis', 'axes.grid.which', 'axes.labelcolor', 
             'axes.labelpad', 'axes.labelpad', 'axes.labelsize', 'axes.labelweight',
             'axes.linewidth']

We want the function to return ONLY those string results within list2 that contain both 'axes' and 'label'. If we used the functions above, they would return all of the strings in list2 because they have either 'axes' or 'label' in each one. 

To achieve this functionality, we resorted to using the Python built-in functions `any` and `all`. 

In [None]:
list2 = ['axes.grid', 'axes.grid.axis', 'axes.grid.which', 'axes.labelcolor', 
         'axes.labelpad', 'axes.labelpad', 'axes.labelsize', 'axes.labelweight',
         'axes.linewidth']

In [16]:
for j in list2:
    if ('axes' in j) and ('label' in j):  # replace with all
        print(j)

axes.labelcolor
axes.labelpad
axes.labelpad
axes.labelsize
axes.labelweight


In [18]:
for j in list2:
    if all(i in j for i in ['axes', 'label']):
        print(j)

axes.labelcolor
axes.labelpad
axes.labelpad
axes.labelsize
axes.labelweight


In [15]:
for j in list2:
    if ('axes' in j) or ('label' in j):  # replace with any
        print(j)

axes.grid
axes.grid.axis
axes.grid.which
axes.labelcolor
axes.labelpad
axes.labelpad
axes.labelsize
axes.labelweight
axes.linewidth


In [19]:
for j in list2:
    if any(i in j for i in ['axes', 'label']):
        print(j)

axes.grid
axes.grid.axis
axes.grid.which
axes.labelcolor
axes.labelpad
axes.labelpad
axes.labelsize
axes.labelweight
axes.linewidth


In [12]:
[j for j in list2 if all(i in j for i in ['axes', 'label'])]

['axes.labelcolor',
 'axes.labelpad',
 'axes.labelpad',
 'axes.labelsize',
 'axes.labelweight']

In [14]:
[j for j in list2 if any(i in j for i in ['axes', 'label'])]

['axes.grid',
 'axes.grid.axis',
 'axes.grid.which',
 'axes.labelcolor',
 'axes.labelpad',
 'axes.labelpad',
 'axes.labelsize',
 'axes.labelweight',
 'axes.linewidth']

**List Comprehensions with `any` and `all`**

These are set up to use exclusive=True, meaning exclude all results unless both words in the list1 are contained within any entry in list2.

In [5]:
def str_contains_D(listOne, listTwo, unique=True, case=True, exclusive=True):
    hits = []
    if case is True:
        if exclusive is True:
            contains = lambda One,Two: [j for j in Two if all(i in j for i in One)]
        else:
            contains = lambda One,Two: [j for j in Two if any(i in j for i in One)]
    else:
        if exclusive is True:
            contains = lambda One,Two: [j for j in Two if all(i.lower() in j.lower() for i in One)]
        else:
            contains = lambda One,Two: [j for j in Two if any(i.lower() in j.lower() for i in One)]
    
    hits = contains(listOne, listTwo)
    hits = (_unique(hits) if unique else hits)
    return hits


In [6]:
import re

In [7]:
def str_contains_E(listOne, listTwo, unique=True, case=True, exclusive=True):
    if case is True:
        if exclusive is True:
            #contains = lambda One,Two: [j for j in Two if all(i in j for i in One)]
            contains = lambda One,Two: [j for j in Two if all(re.search(i, j) is not None for i in One)]

        else:
            contains = lambda One,Two: [j for j in Two for i in One if re.search(i, j) is not None]
    else:
        if exclusive is True:
            contains = lambda One,Two: [j for j in Two if all(re.search(i.lower(), j.lower()) is not None for i in One)]
            
        else:
            # contains = lambda One,Two: [j for j in Two if all(i.lower() in j.lower() for i in One)]
            contains = lambda One,Two: [j for j in Two for i in One if re.search(i.lower(), j.lower()) is not None]
        
    hits = contains(listOne, listTwo)
    hits = (_unique(hits) if unique else hits)
    return hits

In [8]:
list1 = ['Kim', 'zoldak']
list2 = ['kim meyers', 'kim jones', 'kimberly zoldak', 'kimberly anne']

In [None]:
list1 = ['k', 'an']
list2 = ['kim meyers', 'kim jones', 'kimberly zoldak', 'kimberly anne', 'k zoldak']

### Lowercase names in list1 and list2
No need to be concerned with case sensitivity in this example. 

In [None]:
list1 = ['kim', 'zoldak']
list2 = ['kim meyers', 'kim jones', 'kimberly zoldak', 'kimberly anne']

In [None]:
str_contains_A(list1, list2)  # DEFAULTS: unique=True, case=True

In [None]:
str_contains_B(list1, list2) # DEFAULTS: unique=True, case=True

In [None]:
str_contains_C(list1, list2) # DEFAULTS: unique=True, case=True, exclusive=True

In [None]:
str_contains_D(list1, list2) # DEFAULTS: unique=True, case=True, exclusive=True

### Uppercase names in list1
Case sensitivity matters in this example. 

In [None]:
list1 = ['Kim', 'Zoldak']
list2 = ['kim meyers', 'kim jones', 'kimberly zoldak', 'kimberly anne']

**Version A**

In [None]:
str_contains_A(list1, list2)  # DEFAULT

In [None]:
str_contains_A(list1, list2, case=False) # CHANGE CASE SENSITIVE TO FALSE

**Version B**

In [None]:
str_contains_B(list1, list2) # DEFAULT

In [None]:
str_contains_B(list1, list2, case=False) # CHANGE CASE SENSITIVE TO FALSE

**Version C**

In [None]:
str_contains_C(list1, list2) # DEFAULT

In [None]:
str_contains_C(list1, list2, case=False) # CHANGE CASE SENSITIVE TO FALSE

**Version D**

In [None]:
str_contains_D(list1, list2) # DEFAULT

In [None]:
str_contains_D(list1, list2, case=False) # CHANGE CASE SENSITIVE TO FALSE

**Versions C and D**

In versions C and D, we have an `exclusive` argument that can be set to either `True` or `False`. The default is `exclusive=True`, which is why we only see the result that matches both strings in list1. If we change this to `exclusive=False`, then these two will have the same functionality as versions A and B. 

In [None]:
str_contains_C(list1, list2, case=False, exclusive=False)

In [None]:
str_contains_D(list1, list2, case=False, exclusive=False)

### Uppercase names in list2
Case sensitivity matters in this example. 

In [None]:
list1 = ['kim', 'zoldak']
list2 = ['Kim Meyers', 'Kim Jones', 'Kimberly Zoldak', 'Kimberly Anne']

**Version A**

In [None]:
str_contains_A(list1, list2)  # DEFAULT

In [None]:
str_contains_A(list1, list2, case=False) # CHANGE CASE SENSITIVE TO FALSE

**Version B**

In [None]:
str_contains_B(list1, list2) # DEFAULT

In [None]:
str_contains_B(list1, list2, case=False) # CHANGE CASE SENSITIVE TO FALSE

**Version C**

In [None]:
str_contains_C(list1, list2) # DEFAULT

In [None]:
str_contains_C(list1, list2, case=False) # CHANGE CASE SENSITIVE TO FALSE

**Version D**

In [None]:
str_contains_D(list1, list2) # DEFAULT

In [None]:
str_contains_D(list1, list2, case=False) # CHANGE CASE SENSITIVE TO FALSE

### Testing Uniqueness
For this one we will turn of exclusive=True for Versions C and D. This is becuase only one respone would be returned and we would like to see if numerous responses are returned, that are also unique.

In [None]:
list1 = ['dog', ]
list2 = ['Dog:Fido', 'horse:Seabiscuit', 'horse:Molly', 'cat:Oscar', 
         'horse:Bron', 'hampster:Chip', 'dog:Lucy', 'dog:bella',
         'cat:simba', 'dog:Blue', 'dog:Pete', 'turtle:Crush',
         'dog:bella',  'dog:Fido']

**Version A**

In [None]:
str_contains_A(list1, list2, case=True)  # misses 'Dog:Fido'

In [None]:
str_contains_A(list1, list2, case=False)   # returns 'Dog:Fido'

In [None]:
# RETURNS TWO 'dog:bella'
str_contains_A(list1, list2, case=True, unique=False)  

**Version B**

In [None]:
# RETURNS TWO 'dog:bella'
str_contains_B(list1, list2, case=False, unique=False) 

In [None]:
print( str_contains_A(list1, list2, unique=False) )
print( str_contains_B(list1, list2, unique=False) )
print( str_contains_C(list1, list2, unique=False) )
print( str_contains_D(list1, list2, unique=False) )

In [None]:
print( str_contains_A(list1, list2, unique=True) )
print( str_contains_B(list1, list2, unique=True) )
print( str_contains_C(list1, list2, unique=True) )
print( str_contains_D(list1, list2, unique=True) )

In this example, it doesn't matter that we didn't change `exclusive=False` becuase we only have one word we are searching for; `dog`. But just to prove it:

In [None]:
print( str_contains_A(list1, list2, unique=True) )
print( str_contains_B(list1, list2, unique=True) )
print( str_contains_C(list1, list2, unique=True, exclusive=False) )
print( str_contains_D(list1, list2, unique=True, exclusive=False) )

In [None]:
print( str_contains_A(list1, list2, unique=True) )
print( str_contains_B(list1, list2, unique=True) )
print( str_contains_C(list1, list2, unique=True, exclusive=True) )
print( str_contains_D(list1, list2, unique=True, exclusive=True) )

---

In [None]:
print( str_contains_A(list1, list2, case=False, unique=True) )
print( str_contains_B(list1, list2, case=False, unique=True) )
print( str_contains_C(list1, list2, case=False, unique=True) )
print( str_contains_D(list1, list2, case=False, unique=True) )

There are two 'dog:bella' in this list, and this only returned one of them. So, the unique argument worked as desired. 

However, further inspection reveals that it did not catch 'Dog:Fido' and 'dog:Fido'. 
The function I designed applies the case sensitivity ONLY when it is checking for matches between two strings. If there is a match, the string from list2 is then appended in the EXACT form that it exists within list2, not a lowercase or uppercase version of it. Once the list of matches is formed, the unique function is then passed this list. Naturally, it will miss duplicates with different cases. In most instances, this is desired. Python is case sensitive, meaning variable `name` and varible `Name` are acutually two different Python objects. So we would want both returned. If this is not desired by the user, then they can handle this outside of the function. See Below. 

In [None]:
_unique( [i.lower() for i in str_contains_A(list1, list2, case=False, unique=True)] )

In [None]:
# OR 
import numpy as np

np.unique( [i.lower() for i in str_contains_A(list1, list2, case=False, unique=True)] )

---

In [None]:
str_contains_B(list1, list2, case=True)  # misses 'Dog:Fido'

In [None]:
str_contains_B(list1, list2, case=True)  # misses 'Dog:Fido'