# Evolutionary strings

Tollef suggested using the T5 [[1](#1)] to generate sequence of possible words for summarization.
To handle that, we need to do some experimentation on mutation, crossover and such operators for strings. Let's see:

In [46]:
using EvoLP

In [47]:
test = raw"Heisann bygd! Så god og grisende!"

"Heisann bygd! Så god og grisende!"

In [48]:
split(test)

6-element Vector{SubString{String}}:
 "Heisann"
 "bygd!"
 "Så"
 "god"
 "og"
 "grisende!"

Some kind of preprocessing needs to be done to remove question/exclamation marks and such. How do they do it in NLP?

Once split, we make a vector out of it. This is an individual:

In [49]:
ind = Vector(split(test))

6-element Vector{SubString{String}}:
 "Heisann"
 "bygd!"
 "Så"
 "god"
 "og"
 "grisende!"

In [56]:
M1 = SwapMutation()
M2 = InversionMutation()
M3 = ScrambleMutation()
M4 = InsertMutation()
C = OrderOneCrossover()

OrderOneCrossover()

In [51]:
newind = mutate(M1, ind)

6-element Vector{SubString{String}}:
 "Heisann"
 "og"
 "Så"
 "god"
 "bygd!"
 "grisende!"

In [52]:
mutate(M1, ind)

6-element Vector{SubString{String}}:
 "og"
 "bygd!"
 "Så"
 "god"
 "Heisann"
 "grisende!"

In [53]:
mutate(M2, ind)

6-element Vector{SubString{String}}:
 "Heisann"
 "bygd!"
 "god"
 "Så"
 "og"
 "grisende!"

In [54]:
mutate(M3, ind)

6-element Vector{SubString{String}}:
 "Så"
 "bygd!"
 "god"
 "Heisann"
 "og"
 "grisende!"

In [57]:
mutate(M4, ind)

6-element Vector{SubString{String}}:
 "Heisann"
 "bygd!"
 "Så"
 "god"
 "og"
 "grisende!"

In [55]:
cross(C, ind, newind)

6-element Vector{Any}:
 "Heisann"
 "Så"
 "bygd!"
 "god"
 "og"
 "grisende!"

The good news is that all combinatorial operators in EvoLP work out of the box.

Additional mutation ideas:

- Remove a word?
- Remove $k$ words?
- Swap with max distance?

## Crossover

What about crossover? Which operators could we implement?

### References

<span id="1"> [Exploring Transfer Learning with T5: the Text-to-Text Transfer Transformer](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)