# re.findall(): Finding all matches in a string

In [2]:
import re

In [3]:
text = "Sixty-six undergraduate students from an urban college in New York City participated in this study. Students participated in this research study as part of a requirement for a class. Nineteen participants were excluded from the study for meeting one or more exclusion criteria including not completing the sentence unscrambling task, not providing ratings for each of the traits, or failing either one of the two attention checks, leaving the total number of eligible participants at forty-seven. Participants included 36 women aged 18 to 52 years old (M = 20.52, SD = 6.97), 10 men aged 18 to 28 years old (M = 23.5, SD = 12.36), and one individual who did not disclose their sex."



## Extract all occurences of numbers

In [4]:
numbers="\d+"
re.findall(numbers,text)

['36',
 '18',
 '52',
 '20',
 '52',
 '6',
 '97',
 '10',
 '18',
 '28',
 '23',
 '5',
 '12',
 '36']

In [5]:
all_numbers="\d*\.?\d+"
re.findall(all_numbers,text)

['36', '18', '52', '20.52', '6.97', '10', '18', '28', '23.5', '12.36']

## Extract all occurences of specific words using "|" operator

In [64]:
aquarium='Because of problems with her eyesight, rey the African penguin had issues with swimming. That’s unusual for a penguin, and presented a big challenge for our aviculture team to help Rey overcome her hesitancy. Slowly and steadily, we trained her to be comfortable feeding in the water like the rest of the penguin colony. The aviculturists also trained Rey to accept daily eye drops from them as part of her special health care. Rey already had good relationships with some staff, and was comfortable with them handling her. Senior Aviculturist Kim Fukuda says the team built on those bonds to get Rey used to receiving the eye drops. "She knows the routine," Kim says. "I usually give her the eye drops in one area of the exhibit after all the penguins get their vitamins. When that happens, she runs over there and waits for me." Rosa, our oldest sea otter, has very limited eyesight, among other health issues. The sea otter team had already trained Rosa so they could examine her eyes, and built on that trust to include administering the eye drops she needs.'


In [45]:
rey_occurences = "Rey"
re.findall(rey_occurences, aquarium)

['Rey', 'Rey', 'Rey', 'Rey']

In [46]:
rey_occurences = "Rey"
re.findall(rey_occurences,aquarium,flags=re.IGNORECASE)

['rey', 'Rey', 'Rey', 'Rey', 'Rey']

In [47]:
sea_animals="Rey|Rosa"
re.findall(sea_animals,aquarium,flags=re.IGNORECASE)

['rey', 'Rey', 'Rey', 'Rey', 'Rey', 'Rosa', 'Rosa']

## Extracting words that only contain alphabets

In [48]:
gifts = "\Basketball    2    25.63\Tshirt     4   53.92\Sneakers    1    30.58\Mask    10   80.54\GiftCard    2    50.00"

In [49]:
words = '[a-z]+'
re.findall(words,gifts,flags=re.IGNORECASE)

['Basketball', 'Tshirt', 'Sneakers', 'Mask', 'GiftCard']

In [50]:
words="[a-zA-Z]+"
re.findall(words,gifts,flags=re.IGNORECASE)

['Basketball', 'Tshirt', 'Sneakers', 'Mask', 'GiftCard']

## Extracting words that follow a specific pattern

In [68]:
quotes='""([a-zA-Z]+)'
re.findall(r'"(.*?)"',aquarium)

['She knows the routine,',
 'I usually give her the eye drops in one area of the exhibit after all the penguins get their vitamins. When that happens, she runs over there and waits for me.']

# re.match(): Returning first occurence in text

In [16]:
pattern='Rosa'
aquarium_short="Because of problems with her eyesight, rey the African penguin had issues with swimming."
result=re.match(pattern, aquarium_short)

if result:
    print("Search successful.")
else:
    print("Search unsuccessful.")

Search unsuccessful.


In [30]:
pattern='Because'
aquarium_short="Because of problems with her eyesight, rey the African penguin had issues with swimming.That’s unusual for a penguin, and presented a big challenge for our aviculture team to help Rey overcome her hesitancy."
result=re.match(pattern, aquarium_short)

if result:
    print("Search successful.")
else:
    print("Search unsuccessful.")

Search successful.


In [31]:
list = ["dog dot", "data day", "no match"]

# Loop.
for element in list:
    # Match if 2 words starting with letter "d."
    m = re.match("(d\w+)\W(d\w+)", element)

    # See if success.
    if m:
        print(m.groups())

('dog', 'dot')
('data', 'day')


There is a difference between the use of both functions. Both return the first match of a substring found in the string, but re.match() searches only from the beginning of the string and return match object if found. But if a match of substring is found somewhere in the middle of the string, it returns none. 

# re.search(): Finding pattern in text

While re.search() searches for the whole string even if the string contains multi-lines and tries to find a match of the substring in all the lines of string. The Python re.search() function returns a match object when the pattern is found and “null” if the pattern is not found

In [35]:
patterns=['penguin','Rosa']
aquarium_short="Because of problems with her eyesight, rey the African penguin had issues with swimming."
for pattern in patterns:
    print('Looking for "%s" in "%s" = '% (pattern, aquarium_short), end=" ")
    
    if re.search(pattern, aquarium_short):
        print("Match was found")
    else:
        print("No match was found")

Looking for "penguin" in "Because of problems with her eyesight, rey the African penguin had issues with swimming." =  Match was found
Looking for "Rosa" in "Because of problems with her eyesight, rey the African penguin had issues with swimming." =  No match was found
