# EXPRESIONES REGULARES

El módulo re contiene los métodos necesarios para analizar y buscar expresiones regulares en Python.

**import re**

**re.search(pattern,string)**

La función usa dos argumentos, el patrón y la cadena dónde hay que buscarlo


Para indicar a Python que ignore los caracteres especiales como *\t* o *\n* se puede usar *r* fuera de las comillas: **print(r”\t\n”)**

El resultado que devuelve **re.search()** es booleano, true/false

In [1]:
import re

In [None]:
# re.search(pattern,string)
# r se usa para que Python ignore caracteres especiales

In [2]:
dna = "ATCGCGAATTGAFCAC"
if re.search(r"GAZ", dna):
    print("restriction enzyme site found!")
else:
    print("Not site found!")

Not site found!


In [3]:
# Alternaciones
dna = "ATCGCGAATTCAC"
if re.search(r"ATT", dna) or re.search(r"CGAA", dna):
    print("restriction enzyme site found!")

restriction enzyme site found!


In [4]:
dna = "ATCGCGAATTCAC"
if re.search(r"GC(G|C)AATT", dna):
    print("restriction enzyme site found!")

restriction enzyme site found!


In [5]:
# Grupos de caracteres
dna = "ATCGCGGCAATTCAC"
if re.search(r"GC(A|T|G|C)GC", dna):
    print("restriction enzyme site found!")

restriction enzyme site found!


In [6]:
dna = "ATCGCGGCAATTCAC"
if re.search(r"GC[ATGC]GC", dna):
    print("restriction enzyme site found!")

restriction enzyme site found!


In [7]:
# Negaciones
dna = "ATCGCGAATTCAC"
if re.search(r"GC[^N]", dna): #[^N]= menos N (coge todo menos N)
    print("restriction enzyme site found!")

restriction enzyme site found!


In [8]:
# Extración de parte del patrón en una variable
dna = "ATGACGTACGTACGACTG"
# guarda los patrones en la variable m
m = re.search(r"GA[ATGC]{3}AC",dna)
m_pattern = m.group()
print(m_pattern)

GACGTAC


In [None]:
#\d{5}- cualquier nº 5 veces \s-  \t- tabulador \n- salto o espacio \w

In [None]:
m = re.search(r"GA[ATGC]{3}AC",dna)
m
#m_pattern = m.group() para guardar el resultado
#print(m_pattern)

In [9]:
# Extrae un más de un bit del patrón
dna = "ATGACGTACGTACGACTG"
m = re.search(r"GA([ATGC]{3})AC([ATGC]{2})AC", dna)
print("entire match: " + m.group())
print("first bit: " + m.group(1))
print("second bit: " + m.group(2))

entire match: GACGTACGTAC
first bit: CGT
second bit: GT


In [10]:
# Algunos métodos
dna = "ATGACGTACGTACGACTG"
m = re.search(r"GA([ATGC]{3})AC([ATGC]{2})AC", dna)
print("input string: " + dna)
print("entire match: " + m.group())

input string: ATGACGTACGTACGACTG
entire match: GACGTACGTAC


In [11]:
# Start() inicio en que posición empieza el patrón
print("start: " + str(m.start()))

start: 2


In [12]:
# End() donde acaba el patrón
print("end: " + str(m.end()))

end: 13


In [13]:
# Split()
dna = "ACTNGCATRGCTACGTYACGATSCGAWTCG"
runs = re.split(r"C", dna)
print(runs)

['A', 'TNG', 'ATRG', 'TA', 'GTYA', 'GATS', 'GAWT', 'G']


In [14]:
# Extraer  usando group()
dna = "ATGACGTACGTACGACTG"
m = re.search(r"GA([ATGC]{3})AC([ATGC]{2})AC", dna)
print("input string: " + dna)
print("entire match: " + m.group())
print("first bit: " + m.group(1))
print("second bit: " + m.group(2))

input string: ATGACGTACGTACGACTG
entire match: GACGTACGTAC
first bit: CGT
second bit: GT


In [15]:
# Finditer() detecta las bases erróneas
dna = "CGCTCNTAGATGCGCRATGACTGCAYTGC"

matches = re.finditer(r"[^ATGC]", dna)
for m in matches:
    base = m.group()
    pos  = m.start()
    print(base + " found at position " + str(pos))

N found at position 5
R found at position 15
Y found at position 25


In [16]:
dna = "CGCTCNTAGATGCGCRATGACTGCAYTGC"

matches = re.finditer(r"[^ATGC]", dna)

for i in matches:
    match = i.group()
    pos = i.start()
    print(f"La base {match} está en la posición {pos}")

La base N está en la posición 5
La base R está en la posición 15
La base Y está en la posición 25


In [17]:
# Findall()
dna = "CTGCATTATATCGTACGAAATTATACGCGCG"
result = re.findall(r"[AT]{6,}", dna)
print(result)

['ATTATAT', 'AAATTATA']
