# Criando consultas rápidas em um CSV


Esse projeto consiste numa adaptação do projeto guaido do DataQuest `Building Fast Queries on a CSV`.

Então vamos importar as bliotecas necessárias para construir esse projeto.

In [1]:
import csv
from time import time
import random

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [6]:
file= "/content/drive/MyDrive/Eng. Comp.  UFRN/2022.2/EDII/Tarefa 05/the-reddit-climate-change-dataset-comments.csv"

# Classe Clima

Vamos comecar  implementando uma classe para representar o clima. Ele obtém o nome do arquivo CSV como argumento e o lê em self.header e self.rows.

In [8]:
#para ser incoporado na def _inint
def row_sentiment(row):
  return row[8]

class Climate():
    def __init__(self, csv_filename):
      '''Função que inicializa a classe climate'''
      with open(csv_filename) as f: 
        reader = csv.reader(f)
        rows = list(reader)
      # separando o cabelhaço da classe climente do resto do arquivo csv lido 
      self.header = rows[0]        
      self.rows = rows[1:]

      #precisamos converter a rows[8] (sentiment) em float
      self.dict_s = {}
      for row in self.rows:
        if row[8] != '':      
          row[8] = float(row[8])                  
        else:
          row[8] = 0.0
        self.dict_s[int(row[9])] = row
      self.sentiment_to_rows = sorted(self.rows, key=row_sentiment)

      self.id_to_row = {}                        
      for row in self.rows:                       
        self.id_to_row[row[1]] = row
        
    def get_row_id_search(self, id_search):
        '''
         Essa função procura em todas as linhas da lista  na coluna um id alvo e
         retorna a linha caso o id seja encontrado ou None caso o id_search não seja achado
        '''
        for row in self.rows:                 
            if row[1] == id_search:
              return row
        return None 
    
    def get_row_in_dic(self, id_search):
      '''
      Essa função procura no dicionario o id passado com parametro e retorna a linha se o id foi encontato. Caso id não
      seja encontrado a função retorna NONE
      '''
      if  id_search in self.id_to_row:
        return self.id_to_row[id_search]
      else:
        return None
    
    def mensagem_range(self, sentiment, rs=0):
      '''
      Essa função realiza uma busca binária  para o valor sentiment passado  e retorna o index encontrado. Se não encontrar 
      o sentiment o algoritmo retornara -1
      '''
      r_start = rs
      r_end = len(self.sentiment_to_rows) - 1

      while r_start < r_end:
        range_middle = (r_end + r_start) // 2
        value = self.sentiment_to_rows[range_middle][8]
        if value == sentiment:
          return range_middle
        elif value < sentiment:
          r_start = range_middle + 1
        else:
          r_end = range_middle - 1
      if self.sentiment_to_rows[r_start][8] != sentiment:
        return -1
      return r_start

      


    

In [9]:
climate = Climate(file)

In [10]:
print(climate.header)
for i in climate.rows[0:5]:
  print(i)

['type', 'id', 'subreddit.id', 'subreddit.name', 'subreddit.nsfw', 'created_utc', 'permalink', 'body', 'sentiment', 'score']
['comment', 'imlddn9', '2qh3l', 'news', 'false', '1661990368', 'https://old.reddit.com/r/news/comments/x2cszk/us_life_expectancy_down_for_secondstraight_year/imlddn9/', 'Yeah but what the above commenter is saying is their base doesn’t want any of that. They detest all of those things, even the small gradual changes. Investing in nuclear energy is a tacit acknowledgement of man made climate change. Any acknowledgement or concession and they will be primaried out in a minute', 0.5719, '2']
['comment', 'imldbeh', '2qn7b', 'ohio', 'false', '1661990340', 'https://old.reddit.com/r/Ohio/comments/x2awnp/state_government_may_soon_kill_a_solar_project_in/imldbeh/', "Any comparison of efficiency between solar and fossil fuels is nonsensical at best and intentionally misleading at worst. In no universe is light -&gt; photovoltaic cell -&gt; electricity less efficient than l

In [11]:
climate.rows[1][8]

-0.9877

In [12]:
climate.get_row_id_search('imlddn9')

['comment',
 'imlddn9',
 '2qh3l',
 'news',
 'false',
 '1661990368',
 'https://old.reddit.com/r/news/comments/x2cszk/us_life_expectancy_down_for_secondstraight_year/imlddn9/',
 'Yeah but what the above commenter is saying is their base doesn’t want any of that. They detest all of those things, even the small gradual changes. Investing in nuclear energy is a tacit acknowledgement of man made climate change. Any acknowledgement or concession and they will be primaried out in a minute',
 0.5719,
 '2']

In [13]:
climate.get_row_in_dic('imlddn9')

['comment',
 'imlddn9',
 '2qh3l',
 'news',
 'false',
 '1661990368',
 'https://old.reddit.com/r/news/comments/x2cszk/us_life_expectancy_down_for_secondstraight_year/imlddn9/',
 'Yeah but what the above commenter is saying is their base doesn’t want any of that. They detest all of those things, even the small gradual changes. Investing in nuclear energy is a tacit acknowledgement of man made climate change. Any acknowledgement or concession and they will be primaried out in a minute',
 0.5719,
 '2']