<a href="https://colab.research.google.com/github/pedroblossbraga/AlgoOptimize/blob/main/Otimiza%C3%A7%C3%A3o_de_busca.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Otimização de busca

Digamos que tenha duas sequências, e queira descobrir se cada valor de uma sequência está na outra, e registrar isso, sem perda de índices.


## Problema:
- Dadas duas sequências $\{ x_j \}_{j=1}^{N_1}$ , $\{ y_k \}_{k=1}^{N_2}$
- verificar se cada elemento $x_j$ está em $\{ y_k \}_{k=1}^{N_2}$


## Funções auxiliares
- Função de únicos

	$\mathcal{U}(s) = \cup_j s_j , \forall s_j \in s \\ \mathcal{U}: \mathbb{R}^n \rightarrow \mathbb{R}^n$

- Função de Comparação:

  $C(z_1, z_2) = 1 \Leftrightarrow z_1=z_2, \\ C(z_1, z_2) = 0 \Leftrightarrow z_1\neq z_2, $

## Soluções
- Busca bruta:

	$C(x_j, \{y_k\}), \forall x_j$
	
- Busca nos únicos:

	$C(x_j, \mathcal{U}(\{y_k\})), \forall x_j$
	
	$ |\mathcal{U}(\{ y_k \})| \leq | \{ y_k \} | \Rightarrow O(C(x_j, \mathcal{U}( \{ y_k \} ))) \leq O(C(x_j, \{ y_k \} )), \forall x_j$
	
- Busca na intersecção:

	$C(x_j, \mathcal{U}(\{y_k\}) \cap \mathcal{U}(\{x_j\})), \forall x_j$
	
	Se $\exists y_k$ tal que $y_k \notin \{x_j\}$ (ou até mesmo tendo que $| \mathcal{U}( \{ y_k \} ) \cap \mathcal{U}( \{ x_j \} ) | \leq | \mathcal{U}( \{y_k \} ) \cup \mathcal{U}( \{x_j \} ) | $)
	
	Então $|\mathcal{U}(\{y_k\}) \cap \mathcal{U}(\{x_j\})| \leq | \mathcal{U}(\{y_k\})|$
	
	E por fim:
	
	$ O(C(x_j, \mathcal{U}(\{y_k\}) \cap \mathcal{U}(\{x_j\}))) \leq O(C(x_j, \mathcal{U}(\{y_k\}))) \leq O(C(x_j, \{y_k\}))$
	

## Obs:
- se não estivéssemos preocupados em guardar o índice, obviamente poderíamos considerar sempre os elementos sem repetição de $\{x_j\}$, i.e., $\mathcal{U}(\{x_j\})$


In [75]:
from IPython.display import display
import time
import pandas as pd
import numpy as np
import random

In [68]:
def unique(l):
  aux = []
  for i in l:
    if i not in aux:
      aux.append(i)
  return aux

class ElementSearch:
  def __init__(self, x, y):
    self.x = x
    self.y = y
    self.intersection = list(set(x)&set(y))
    self.u_x = unique(x)
    self.u_y = unique(y)

  # (I) brute search
  def brute_search(self):
    x,y = self.x, self.y
    res=[]
    t0=time.time()
    for x_j in x:
      N = len(y)
      if x_j in y:
        res.append('Found')
      else:
        res.append('Not Found')
    dt = time.time() - t0
    return res, N, dt

  # (II) unique search
  def unique_search(self):
    x = self.x
    u_y = self.u_y
    res=[]
    t0=time.time()
    for x_j in x:
      N = len(u_y)
      if x_j in u_y:
        
        res.append('Found')
      else:
        res.append('Not Found')
    dt= time.time() - t0
    return res, N, dt

  # (III) intersection search
  def intersection_search(self):
    res=[]
    u_x = self.u_x
    u_y = self.u_y
    intersection = self.intersection

    t0=time.time()
    for x_j in x:
      N = len(intersection)
      if x_j in intersection:
        res.append('Found')
      else:
        res.append('Not Found')
    dt= time.time() - t0
    return res, N, dt

  
  def results(self, verbose=False):
    results ={
        'res': [],
        'y_size': [],
        'dt (s)': [],
        'algorithm':[]
    }
    algo_dict = {
        self.brute_search:'brute',
        self.unique_search:'unique',
        self.intersection_search:'intersection'
    }
    for algo in [self.brute_search, self.unique_search, self.intersection_search]:
      res, N, dt= algo()
      if verbose:
        print('res: {}\n c: {} \n dt: {}'.format(res, c, dt))
      results['res'].append(res)
      results['y_size'].append(N)
      results['dt (s)'].append(dt)
      results['algorithm'].append(algo_dict[algo])

    df = pd.DataFrame(results)
    df = df.sort_values(['dt (s)'], ascending=False)
    display(df)


x = ['a','a','b','c','c','c','d','e']
y = ['a','d','d','d','d','f','g']

ElementSearch(x,y).results()

Unnamed: 0,res,y_size,dt (s),algorithm
0,"[Found, Found, Not Found, Not Found, Not Found...",7,3e-06,brute
1,"[Found, Found, Not Found, Not Found, Not Found...",4,2e-06,unique
2,"[Found, Found, Not Found, Not Found, Not Found...",2,2e-06,intersection


In [None]:
x = ['a','a','b','c','c','c','d','e']
y = ['a','d','d','d','d','f','g']

ElementSearch(x,y).results()

In [71]:
v1 = np.random.rand(100)
v2 = np.random.rand(100)
ElementSearch(v1,v2).results()

Unnamed: 0,res,y_size,dt (s),algorithm
0,"[Not Found, Not Found, Not Found, Not Found, N...",100,0.000799,brute
1,"[Not Found, Not Found, Not Found, Not Found, N...",100,0.000444,unique
2,"[Not Found, Not Found, Not Found, Not Found, N...",0,5e-06,intersection


In [76]:
random.randint(0,10)

10

In [80]:
## name search
names = ['albert', 'bolzano', 'cauchy', "d'alambert", 'euclid', 'fourier', 'gauss', 'hilbert', 'isaac', 'jacobian', 'kolmogorov', 'leonard',
         'maxwell', 'newton', 'oparin', 'planck', 'quentin', 'ramanujan', 'srinivasa', 'tchaikowsky', 'ursula', 'vinci', 'xenonium', 'f(x)', 'zigmund']
def generate_name_vector(N, names=names):
  return [names[random.randint(0, len(names)-1)] for k in range(N)]

generate_name_vector(N=100)

["d'alambert",
 'fourier',
 'fourier',
 'bolzano',
 'zigmund',
 'quentin',
 'kolmogorov',
 'maxwell',
 'newton',
 'vinci',
 'euclid',
 'maxwell',
 'ramanujan',
 'ramanujan',
 'srinivasa',
 'albert',
 'euclid',
 "d'alambert",
 'vinci',
 'tchaikowsky',
 'zigmund',
 'bolzano',
 'tchaikowsky',
 'srinivasa',
 "d'alambert",
 'f(x)',
 'newton',
 'ursula',
 'planck',
 'tchaikowsky',
 'vinci',
 'ramanujan',
 'bolzano',
 'kolmogorov',
 'xenonium',
 'vinci',
 'leonard',
 'hilbert',
 'bolzano',
 'newton',
 'albert',
 'srinivasa',
 'oparin',
 'jacobian',
 'newton',
 'srinivasa',
 'ramanujan',
 'maxwell',
 'hilbert',
 'newton',
 'euclid',
 'kolmogorov',
 'hilbert',
 'ursula',
 'ursula',
 'zigmund',
 'tchaikowsky',
 'jacobian',
 'isaac',
 'tchaikowsky',
 'planck',
 'newton',
 'leonard',
 'kolmogorov',
 'gauss',
 'leonard',
 'tchaikowsky',
 'zigmund',
 'oparin',
 'cauchy',
 'kolmogorov',
 'isaac',
 'srinivasa',
 'f(x)',
 'zigmund',
 'gauss',
 'gauss',
 'newton',
 'quentin',
 'zigmund',
 'maxwell',
 'f

In [81]:
v1 = generate_name_vector(N=100)
v2 = generate_name_vector(N=200)
ElementSearch(v1,v2).results()

Unnamed: 0,res,y_size,dt (s),algorithm
0,"[Found, Found, Found, Found, Found, Found, Fou...",200,7.6e-05,brute
1,"[Found, Found, Found, Found, Found, Found, Fou...",25,5.4e-05,unique
2,"[Not Found, Not Found, Not Found, Not Found, N...",25,9e-06,intersection
