# Pré-Processamento
Esse **Jupyter Notebook** tem como objetivo aplicar um **Pré-Processamento** no conjunto de dados (ou em parte dele).

# Resumo da Análise preliminar
Na etapa anterior foi feita uma breve análise do conjunto de dados. O **resumo** dessa análise foi o seguinte:

 - Temos um grande conjunto de dados para trabalharmos:
   - Com 244.768 amostras e 12 colunas (atributos/campos/features).
 - Porém, vai ser necessário um Pré-Processamento na maior parte das colunas, devido o fato das colunas serem representadas por textos (informações).
 - Algumas colunas estão com muitos dados faltantes, principalmente a **ContractType** que tem **73%** dos dados faltantes.
 - Estatísticas da variável (feature) **"SalaryNormalized"**:
   - O menor salário de todos (anualmente) foi de 5.000;
   - O maior salário de todos (anualmente) foi de 200.000;
   - A média (mean) de todos os salários (anualmente) foi de 34.122;
   - A mediana (median/2° Quartil = 50% dos dados) de todos os salários (anualmente) foi de 30.000:
     - Vejam que a nossa mediana não está tão distante da nossa média.
   - O Desvio Padrão (Standard Deviation/ que representa quão longe nós estamos da média) é 17.640.
   - A moda (o salário mais frequente) foi 35.000 com 9.178 amostras. 

# Classe "Preprocessing"
Um dos requisitos da **GRIA** para o desafio era que os códigos fossem reaproveitados. Isso para evitar códigos duplicados e reaproveitamento de códigos em trabalhos futuros.

In [1]:
class Preprocessing:

  def install_dependencies(self):
    !pip install --upgrade -r ../requirements.txt

**NOTE:**
Ok, agora que nós já temos nossa classe de **Pré-Processamento** e reaproveitamento de código vamos começar criando uma `instância` dessa classe.

---

# 01 -  Baixando & Importando as bibliotecas necessárias

In [2]:
# preprocessing.install_dependencies()

Agora vamos importar as bibliotecas necessárias:

In [3]:
import pandas as pd
import py7zr

Agora vamos extrair o conjunto de dados:

In [4]:
with py7zr.SevenZipFile("../datasets/Train_rev1.7z", mode='r') as archive:
  archive.extractall(path="/tmp") # For Linux users.

**NOTE:**  
Como o conjunto de dados é muito grande resolvi baixar a versão mais comprimida **.7z**. Optei também por descomprimir o conjunto de dados em um local temporário (diretório **/temp** no meu caso que estou utilizando Linux / Como se fosse uma **Staging Area**).

**Configurando o tamanho das saídas (outputs):**  
Antes de iniciarmos nossa análise vamos configurar o Pandas para exibir todo o conteúdo por amostra:

In [5]:
pd.options.display.max_colwidth = 100000

Por fim, vamos pegar o conjunto de dados baixado:

In [6]:
full_df = pd.read_csv("/tmp/Train_rev1.csv")

# 02 - Visão geral (overview) do conjunto de dados
Bem, como nós já fizemos uma **Análise Preliminar** do conjunto de dados e vamos trabalhar cada variável (feature) individualmente vamos apenas exibir as informações gerais do conjunto de dados com a função **info()** do *Pandas*.

In [7]:
full_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244768 entries, 0 to 244767
Data columns (total 12 columns):
 #   Column              Non-Null Count   Dtype 
---  ------              --------------   ----- 
 0   Id                  244768 non-null  int64 
 1   Title               244767 non-null  object
 2   FullDescription     244768 non-null  object
 3   LocationRaw         244768 non-null  object
 4   LocationNormalized  244768 non-null  object
 5   ContractType        65442 non-null   object
 6   ContractTime        180863 non-null  object
 7   Company             212338 non-null  object
 8   Category            244768 non-null  object
 9   SalaryRaw           244768 non-null  object
 10  SalaryNormalized    244768 non-null  int64 
 11  SourceName          244767 non-null  object
dtypes: int64(2), object(10)
memory usage: 22.4+ MB


---

# 03 - Aplicando Pré-Processamento nas colunas (features)
Nessa etapa vamos aplicar um **Pré-Processamento** em cada coluna individualmente.

---

## 03.1 - Pré-Processando a coluna (feature) "Id"
> Essa coluna (feature) não vai precisar ser Pré-Processada. Como nós sabemos é apenas o identificado único de cada amostra.

---

## 03.2 - Pré-Processando a coluna (feature) "Title"
> Resumidamente, o **Title** é o resumo do *cargo* ou *função*.

### Preparando e colocando o tipo de dado mais adequado na *coluna (feature)* "title":

In [8]:
df_Title = full_df[["Title"]].copy()
df_Title = df_Title.astype({'Title': 'string'})
df_Title.info()
df_Title.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244768 entries, 0 to 244767
Data columns (total 1 columns):
 #   Column  Non-Null Count   Dtype 
---  ------  --------------   ----- 
 0   Title   244767 non-null  string
dtypes: string(1)
memory usage: 1.9 MB


Unnamed: 0,Title
0,Engineering Systems Analyst
1,Stress Engineer Glasgow
2,Modelling and simulation analyst
3,Engineering Systems Analyst / Mathematical Modeller
4,"Pioneer, Miser Engineering Systems Analyst"


### Verificando quanto porcento (%) dos dados são ausentes (missing):

Vamos começar verificando o **número** de dados ausentes na coluna (feature) **Title**:

In [9]:
# Data missing sum.
missing = df_Title.isnull().sum()
missing

Title    1
dtype: int64

Nós temos que entre às 244.768 amostras, apenas uma delas está faltando o **title (título)**. Vamos ver quanto porcento representa esse único título faltante:

In [10]:
# Data missing in percent.
percentMissing = (missing / len(df_Title.index)) * 100
percentMissing

Title    0.000409
dtype: float64

**NOTE:**  
Agora vem a pergunta-chave:

> **Por que apenas uma das amostras está sem o título?**

### Aplicando Lower Casing:

In [11]:
df_Title["processed_title"] = df_Title["Title"].str.lower()
df_Title.head()

Unnamed: 0,Title,processed_title
0,Engineering Systems Analyst,engineering systems analyst
1,Stress Engineer Glasgow,stress engineer glasgow
2,Modelling and simulation analyst,modelling and simulation analyst
3,Engineering Systems Analyst / Mathematical Modeller,engineering systems analyst / mathematical modeller
4,"Pioneer, Miser Engineering Systems Analyst","pioneer, miser engineering systems analyst"


### Removendo pontuações:

In [12]:
df_Title["processed_title"] = df_Title["processed_title"].str.replace('[^\w\s]',' ', regex=True)
df_Title.head()

Unnamed: 0,Title,processed_title
0,Engineering Systems Analyst,engineering systems analyst
1,Stress Engineer Glasgow,stress engineer glasgow
2,Modelling and simulation analyst,modelling and simulation analyst
3,Engineering Systems Analyst / Mathematical Modeller,engineering systems analyst mathematical modeller
4,"Pioneer, Miser Engineering Systems Analyst",pioneer miser engineering systems analyst


### Removendo números:

In [13]:
df_Title["processed_title"] = df_Title["processed_title"].str.replace('[0-9]+', '', regex=True)
df_Title.head()

Unnamed: 0,Title,processed_title
0,Engineering Systems Analyst,engineering systems analyst
1,Stress Engineer Glasgow,stress engineer glasgow
2,Modelling and simulation analyst,modelling and simulation analyst
3,Engineering Systems Analyst / Mathematical Modeller,engineering systems analyst mathematical modeller
4,"Pioneer, Miser Engineering Systems Analyst",pioneer miser engineering systems analyst


### Aplicando Stopword Removal (Remoção de palavras irrelevantes):

In [14]:
from nltk.corpus import stopwords
", ".join(stopwords.words('english'))

"i, me, my, myself, we, our, ours, ourselves, you, you're, you've, you'll, you'd, your, yours, yourself, yourselves, he, him, his, himself, she, she's, her, hers, herself, it, it's, its, itself, they, them, their, theirs, themselves, what, which, who, whom, this, that, that'll, these, those, am, is, are, was, were, be, been, being, have, has, had, having, do, does, did, doing, a, an, the, and, but, if, or, because, as, until, while, of, at, by, for, with, about, against, between, into, through, during, before, after, above, below, to, from, up, down, in, out, on, off, over, under, again, further, then, once, here, there, when, where, why, how, all, any, both, each, few, more, most, other, some, such, no, nor, not, only, own, same, so, than, too, very, s, t, can, will, just, don, don't, should, should've, now, d, ll, m, o, re, ve, y, ain, aren, aren't, couldn, couldn't, didn, didn't, doesn, doesn't, hadn, hadn't, hasn, hasn't, haven, haven't, isn, isn't, ma, mightn, mightn't, mustn, mus

In [15]:
STOPWORDS = set(stopwords.words('english'))
def remove_stopwords(text):
  return " ".join([word for word in str(text).split() if word not in STOPWORDS])

df_Title["processed_title"] = df_Title["processed_title"].apply(lambda text: remove_stopwords(text))
df_Title.head()

Unnamed: 0,Title,processed_title
0,Engineering Systems Analyst,engineering systems analyst
1,Stress Engineer Glasgow,stress engineer glasgow
2,Modelling and simulation analyst,modelling simulation analyst
3,Engineering Systems Analyst / Mathematical Modeller,engineering systems analyst mathematical modeller
4,"Pioneer, Miser Engineering Systems Analyst",pioneer miser engineering systems analyst


### Removendo palavras mais frequentes:

In [16]:
from collections import Counter

cnt_title = Counter() # Instance
for text in df_Title["processed_title"].values:
  for word in text.split():
    cnt_title[word] += 1

cnt_title.most_common(10)

[('manager', 50162),
 ('engineer', 24192),
 ('sales', 19769),
 ('senior', 16976),
 ('developer', 13895),
 ('assistant', 12179),
 ('k', 11057),
 ('executive', 10632),
 ('business', 9988),
 ('consultant', 9496)]

**NOTE:**  
Na minha opinião quase todas, senão todas (tirandi "k") são relevantes para o modelo aprender. Sabendo disso não vou remover nenhuma delas.

### Remoção de palavras raras:

In [17]:
n_rare_words = 10
RAREWORDS = set([w for (w, wc) in cnt_title.most_common()[:-n_rare_words-1:-1]])
RAREWORDS

{'bellhill',
 'constructions',
 'hydrolic',
 'improvemen',
 'leadopportunity',
 'mlnlycke',
 'norley',
 'tase',
 'techniciancivil',
 'tuiton'}

In [18]:
n_rare_words = 10
RAREWORDS = set([w for (w, wc) in cnt_title.most_common()[:-n_rare_words-1:-1]])
def remove_rarewords(text):
  return " ".join([word for word in str(text).split() if word not in RAREWORDS])

df_Title["processed_title"] = df_Title["processed_title"].apply(lambda text: remove_rarewords(text))
df_Title.head()

Unnamed: 0,Title,processed_title
0,Engineering Systems Analyst,engineering systems analyst
1,Stress Engineer Glasgow,stress engineer glasgow
2,Modelling and simulation analyst,modelling simulation analyst
3,Engineering Systems Analyst / Mathematical Modeller,engineering systems analyst mathematical modeller
4,"Pioneer, Miser Engineering Systems Analyst",pioneer miser engineering systems analyst


### Aplicando a técnica de Stemming:

In [19]:
from nltk.stem.porter import PorterStemmer

stemmer = PorterStemmer() # Instance.
def stem_words(text):
  return " ".join([stemmer.stem(word) for word in text.split()])

df_Title["processed_title"] = df_Title["processed_title"].apply(lambda text: stem_words(text))
df_Title.head()

Unnamed: 0,Title,processed_title
0,Engineering Systems Analyst,engin system analyst
1,Stress Engineer Glasgow,stress engin glasgow
2,Modelling and simulation analyst,model simul analyst
3,Engineering Systems Analyst / Mathematical Modeller,engin system analyst mathemat model
4,"Pioneer, Miser Engineering Systems Analyst",pioneer miser engin system analyst


### Aplicando a técnica de Lemmatization + Part-of-Speech Tagging:

In [20]:
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import nltk

lemmatizer = WordNetLemmatizer() # Instance
wordnet_map = {"N":wordnet.NOUN, "V":wordnet.VERB, "J":wordnet.ADJ, "R":wordnet.ADV} # Apply dict mapping.

# Lemmatize words function.
def lemmatize_words(text):
  pos_tagged_text = nltk.pos_tag(text.split())
  return " ".join([lemmatizer.lemmatize(word, wordnet_map.get(pos[0], wordnet.NOUN)) for word, pos in pos_tagged_text])

df_Title["processed_title"] = df_Title["processed_title"].apply(lambda text: lemmatize_words(text))
df_Title.head()

Unnamed: 0,Title,processed_title
0,Engineering Systems Analyst,engin system analyst
1,Stress Engineer Glasgow,stress engin glasgow
2,Modelling and simulation analyst,model simul analyst
3,Engineering Systems Analyst / Mathematical Modeller,engin system analyst mathemat model
4,"Pioneer, Miser Engineering Systems Analyst",pioneer miser engin system analyst


### Aplicando a técnica de Count Vectorizer:

In [21]:
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer() # Instance.
df_title_vectorized = vectorizer.fit_transform(df_Title["processed_title"])

In [22]:
df_title_vectorized

<244768x14917 sparse matrix of type '<class 'numpy.int64'>'
	with 923171 stored elements in Compressed Sparse Row format>

---

## 03.3 - Pré-Processando a coluna (feature) "SalaryNormalized"
> Tem o mesmo significado da coluna **"SalaryRaw"**, porém a **Adzuna** normalizou os dados para ser representado de forma anualizado.

In [24]:
df_SalaryNormalized = full_df[["SalaryNormalized"]]
df_SalaryNormalized.info()
df_SalaryNormalized.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244768 entries, 0 to 244767
Data columns (total 1 columns):
 #   Column            Non-Null Count   Dtype
---  ------            --------------   -----
 0   SalaryNormalized  244768 non-null  int64
dtypes: int64(1)
memory usage: 1.9 MB


Unnamed: 0,SalaryNormalized
0,25000
1,30000
2,30000
3,27500
4,25000


### Verificando quanto porcento (%) dos dados são ausentes (missing):

In [25]:
# Data missing sum.
missing = df_SalaryNormalized.isnull().sum()
missing

SalaryNormalized    0
dtype: int64

In [26]:
# Data missing in percent.
percentMissing = (missing / len(df_SalaryNormalized.index)) * 100
percentMissing

SalaryNormalized    0.0
dtype: float64

**NOTE:**  
Essa é a variável **target**. Por hora, vamos trabalhar com ela do jeito que está normalizado pelo a **Adzuna** ignorando se a mesma realmente fez um bom trabalho. O objetivo dessa abordagem vai ser ter algo disponível para a etapa de **treinamento** e **validação** trabalhar o mais rápido possível.

---

## 03.4 - Pré-Processando a coluna (feature) "FullDescription"
> O texto completo do anúncio de emprego, conforme fornecido pelo anunciante do emprego.

**NOTE:**  
Onde teria o salário (salary) qual foi retirado os valores da descrição para garantir que nenhuma informação de salário apareça nas descrições. Pode haver algum dano colateral aqui, pois também foi removido outros números.

### Preparando e colocando o tipo de dado mais adequado na coluna (feature) "FullDescription":

In [8]:
df_FullDescription = full_df[["FullDescription"]].copy()
df_FullDescription = df_FullDescription.astype({'FullDescription': 'string'})
df_FullDescription.info()
df_FullDescription.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244768 entries, 0 to 244767
Data columns (total 1 columns):
 #   Column           Non-Null Count   Dtype 
---  ------           --------------   ----- 
 0   FullDescription  244768 non-null  string
dtypes: string(1)
memory usage: 1.9 MB


Unnamed: 0,FullDescription
0,"Engineering Systems Analyst Dorking Surrey Salary ****K Our client is located in Dorking, Surrey and are looking for Engineering Systems Analyst our client provides specialist software development Keywords Mathematical Modelling, Risk Analysis, System Modelling, Optimisation, MISER, PIONEEER Engineering Systems Analyst Dorking Surrey Salary ****K"
1,"Stress Engineer Glasgow Salary **** to **** We re currently looking for talented engineers to join our growing Glasgow team at a variety of levels. The roles are ideally suited to high calibre engineering graduates with any level of appropriate experience, so that we can give you the opportunity to use your technical skills to provide high quality input to our aerospace projects, spanning both aerostructures and aeroengines. In return, you can expect good career opportunities and the chance for advancement and personal and professional development, support while you gain Chartership and some opportunities to possibly travel or work in other offices, in or outside of the UK. The Requirements You will need to have a good engineering degree that includes structural analysis (such as aeronautical, mechanical, automotive, civil) with some experience in a professional engineering environment relevant to (but not limited to) the aerospace sector. You will need to demonstrate experience in at least one or more of the following areas: Structural/stress analysis Composite stress analysis (any industry) Linear and nonlinear finite element analysis Fatigue and damage tolerance Structural dynamics Thermal analysis Aerostructures experience You will also be expected to demonstrate the following qualities: A strong desire to progress quickly to a position of leadership Professional approach Strong communication skills, written and verbal Commercial awareness Team working, being comfortable working in international teams and self managing PLEASE NOTE SECURITY CLEARANCE IS REQUIRED FOR THIS ROLE Stress Engineer Glasgow Salary **** to ****"
2,"Mathematical Modeller / Simulation Analyst / Operational Analyst Basingstoke, Hampshire Up to ****K AAE pension contribution, private medical and dental The opportunity Our client is an independent consultancy firm which has an opportunity for a Data Analyst with 35 years experience. The role will require the successful candidate to demonstrate their ability to analyse a problem and arrive at a solution, with varying levels of data being available. Essential skills Thorough knowledge of Excel and proven ability to utilise this to create powerful decision support models Experience in Modelling and Simulation Techniques, Experience of techniques such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background minimum degree qualified Proven analytical and problem solving skills Self Starter Ability to develop solid working relationships In addition to formal qualifications and experience, the successful candidate will require excellent written and verbal communication skills, be energetic, enterprising and have a determination to succeed. They will be required to build solid working relationships, both internally with colleagues and, most importantly, externally with our clients. They must be comfortable working independently to deliver against challenging client demands. The offices are located in Basingstoke, Hampshire, but our client work for clients worldwide. The successful candidate must therefore be prepared to undertake work at client sites for short periods of time. Physics, Mathematics, Modelling, Simulation, Analytical, Operational Research, Mathematical Modelling Mathematical Modeller / Simulation Analyst / Operational Analyst Basingstoke, Hampshire ****K AAE pension contribution, private medical and dental"
3,"Engineering Systems Analyst / Mathematical Modeller. Our client is a highly successful and respected Consultancy providing specialist software development MISER, PIONEER, Maths, Mathematical, Optimisation, Risk Analysis, Asset Management, Water Industry, Access, Excel, VBA, SQL, Systems . Engineering Systems Analyst / Mathematical Modeller. Salary ****K****K negotiable Location Dorking, Surrey"
4,"Pioneer, Miser Engineering Systems Analyst Dorking Surrey Salary ****K Located in Surrey, our client provides specialist software development Pioneer, Miser Engineering Systems Analyst Dorking Surrey Salary ****K"


### Verificando quanto porcento (%) dos dados são ausentes (missing):

In [9]:
# Data missing sum.
missing = df_FullDescription.isnull().sum()
missing

FullDescription    0
dtype: int64

In [10]:
# Data missing in percent.
percentMissing = (missing / len(df_FullDescription.index)) * 100
percentMissing

FullDescription    0.0
dtype: float64

### Aplicando Lower Casing:

In [11]:
df_FullDescription["processed_FullDescription"] = df_FullDescription["FullDescription"].str.lower()
df_FullDescription.head()

Unnamed: 0,FullDescription,processed_FullDescription
0,"Engineering Systems Analyst Dorking Surrey Salary ****K Our client is located in Dorking, Surrey and are looking for Engineering Systems Analyst our client provides specialist software development Keywords Mathematical Modelling, Risk Analysis, System Modelling, Optimisation, MISER, PIONEEER Engineering Systems Analyst Dorking Surrey Salary ****K","engineering systems analyst dorking surrey salary ****k our client is located in dorking, surrey and are looking for engineering systems analyst our client provides specialist software development keywords mathematical modelling, risk analysis, system modelling, optimisation, miser, pioneeer engineering systems analyst dorking surrey salary ****k"
1,"Stress Engineer Glasgow Salary **** to **** We re currently looking for talented engineers to join our growing Glasgow team at a variety of levels. The roles are ideally suited to high calibre engineering graduates with any level of appropriate experience, so that we can give you the opportunity to use your technical skills to provide high quality input to our aerospace projects, spanning both aerostructures and aeroengines. In return, you can expect good career opportunities and the chance for advancement and personal and professional development, support while you gain Chartership and some opportunities to possibly travel or work in other offices, in or outside of the UK. The Requirements You will need to have a good engineering degree that includes structural analysis (such as aeronautical, mechanical, automotive, civil) with some experience in a professional engineering environment relevant to (but not limited to) the aerospace sector. You will need to demonstrate experience in at least one or more of the following areas: Structural/stress analysis Composite stress analysis (any industry) Linear and nonlinear finite element analysis Fatigue and damage tolerance Structural dynamics Thermal analysis Aerostructures experience You will also be expected to demonstrate the following qualities: A strong desire to progress quickly to a position of leadership Professional approach Strong communication skills, written and verbal Commercial awareness Team working, being comfortable working in international teams and self managing PLEASE NOTE SECURITY CLEARANCE IS REQUIRED FOR THIS ROLE Stress Engineer Glasgow Salary **** to ****","stress engineer glasgow salary **** to **** we re currently looking for talented engineers to join our growing glasgow team at a variety of levels. the roles are ideally suited to high calibre engineering graduates with any level of appropriate experience, so that we can give you the opportunity to use your technical skills to provide high quality input to our aerospace projects, spanning both aerostructures and aeroengines. in return, you can expect good career opportunities and the chance for advancement and personal and professional development, support while you gain chartership and some opportunities to possibly travel or work in other offices, in or outside of the uk. the requirements you will need to have a good engineering degree that includes structural analysis (such as aeronautical, mechanical, automotive, civil) with some experience in a professional engineering environment relevant to (but not limited to) the aerospace sector. you will need to demonstrate experience in at least one or more of the following areas: structural/stress analysis composite stress analysis (any industry) linear and nonlinear finite element analysis fatigue and damage tolerance structural dynamics thermal analysis aerostructures experience you will also be expected to demonstrate the following qualities: a strong desire to progress quickly to a position of leadership professional approach strong communication skills, written and verbal commercial awareness team working, being comfortable working in international teams and self managing please note security clearance is required for this role stress engineer glasgow salary **** to ****"
2,"Mathematical Modeller / Simulation Analyst / Operational Analyst Basingstoke, Hampshire Up to ****K AAE pension contribution, private medical and dental The opportunity Our client is an independent consultancy firm which has an opportunity for a Data Analyst with 35 years experience. The role will require the successful candidate to demonstrate their ability to analyse a problem and arrive at a solution, with varying levels of data being available. Essential skills Thorough knowledge of Excel and proven ability to utilise this to create powerful decision support models Experience in Modelling and Simulation Techniques, Experience of techniques such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background minimum degree qualified Proven analytical and problem solving skills Self Starter Ability to develop solid working relationships In addition to formal qualifications and experience, the successful candidate will require excellent written and verbal communication skills, be energetic, enterprising and have a determination to succeed. They will be required to build solid working relationships, both internally with colleagues and, most importantly, externally with our clients. They must be comfortable working independently to deliver against challenging client demands. The offices are located in Basingstoke, Hampshire, but our client work for clients worldwide. The successful candidate must therefore be prepared to undertake work at client sites for short periods of time. Physics, Mathematics, Modelling, Simulation, Analytical, Operational Research, Mathematical Modelling Mathematical Modeller / Simulation Analyst / Operational Analyst Basingstoke, Hampshire ****K AAE pension contribution, private medical and dental","mathematical modeller / simulation analyst / operational analyst basingstoke, hampshire up to ****k aae pension contribution, private medical and dental the opportunity our client is an independent consultancy firm which has an opportunity for a data analyst with 35 years experience. the role will require the successful candidate to demonstrate their ability to analyse a problem and arrive at a solution, with varying levels of data being available. essential skills thorough knowledge of excel and proven ability to utilise this to create powerful decision support models experience in modelling and simulation techniques, experience of techniques such as discrete event simulation and/or sd modelling mathematical/scientific background minimum degree qualified proven analytical and problem solving skills self starter ability to develop solid working relationships in addition to formal qualifications and experience, the successful candidate will require excellent written and verbal communication skills, be energetic, enterprising and have a determination to succeed. they will be required to build solid working relationships, both internally with colleagues and, most importantly, externally with our clients. they must be comfortable working independently to deliver against challenging client demands. the offices are located in basingstoke, hampshire, but our client work for clients worldwide. the successful candidate must therefore be prepared to undertake work at client sites for short periods of time. physics, mathematics, modelling, simulation, analytical, operational research, mathematical modelling mathematical modeller / simulation analyst / operational analyst basingstoke, hampshire ****k aae pension contribution, private medical and dental"
3,"Engineering Systems Analyst / Mathematical Modeller. Our client is a highly successful and respected Consultancy providing specialist software development MISER, PIONEER, Maths, Mathematical, Optimisation, Risk Analysis, Asset Management, Water Industry, Access, Excel, VBA, SQL, Systems . Engineering Systems Analyst / Mathematical Modeller. Salary ****K****K negotiable Location Dorking, Surrey","engineering systems analyst / mathematical modeller. our client is a highly successful and respected consultancy providing specialist software development miser, pioneer, maths, mathematical, optimisation, risk analysis, asset management, water industry, access, excel, vba, sql, systems . engineering systems analyst / mathematical modeller. salary ****k****k negotiable location dorking, surrey"
4,"Pioneer, Miser Engineering Systems Analyst Dorking Surrey Salary ****K Located in Surrey, our client provides specialist software development Pioneer, Miser Engineering Systems Analyst Dorking Surrey Salary ****K","pioneer, miser engineering systems analyst dorking surrey salary ****k located in surrey, our client provides specialist software development pioneer, miser engineering systems analyst dorking surrey salary ****k"


### Removendo URLs:

In [12]:
df_FullDescription["processed_FullDescription"] = df_FullDescription["processed_FullDescription"].str.replace(r'\s*https?://\S+(\s+|$)', ' ', regex=True).str.strip()
df_FullDescription.head()

Unnamed: 0,FullDescription,processed_FullDescription
0,"Engineering Systems Analyst Dorking Surrey Salary ****K Our client is located in Dorking, Surrey and are looking for Engineering Systems Analyst our client provides specialist software development Keywords Mathematical Modelling, Risk Analysis, System Modelling, Optimisation, MISER, PIONEEER Engineering Systems Analyst Dorking Surrey Salary ****K","engineering systems analyst dorking surrey salary ****k our client is located in dorking, surrey and are looking for engineering systems analyst our client provides specialist software development keywords mathematical modelling, risk analysis, system modelling, optimisation, miser, pioneeer engineering systems analyst dorking surrey salary ****k"
1,"Stress Engineer Glasgow Salary **** to **** We re currently looking for talented engineers to join our growing Glasgow team at a variety of levels. The roles are ideally suited to high calibre engineering graduates with any level of appropriate experience, so that we can give you the opportunity to use your technical skills to provide high quality input to our aerospace projects, spanning both aerostructures and aeroengines. In return, you can expect good career opportunities and the chance for advancement and personal and professional development, support while you gain Chartership and some opportunities to possibly travel or work in other offices, in or outside of the UK. The Requirements You will need to have a good engineering degree that includes structural analysis (such as aeronautical, mechanical, automotive, civil) with some experience in a professional engineering environment relevant to (but not limited to) the aerospace sector. You will need to demonstrate experience in at least one or more of the following areas: Structural/stress analysis Composite stress analysis (any industry) Linear and nonlinear finite element analysis Fatigue and damage tolerance Structural dynamics Thermal analysis Aerostructures experience You will also be expected to demonstrate the following qualities: A strong desire to progress quickly to a position of leadership Professional approach Strong communication skills, written and verbal Commercial awareness Team working, being comfortable working in international teams and self managing PLEASE NOTE SECURITY CLEARANCE IS REQUIRED FOR THIS ROLE Stress Engineer Glasgow Salary **** to ****","stress engineer glasgow salary **** to **** we re currently looking for talented engineers to join our growing glasgow team at a variety of levels. the roles are ideally suited to high calibre engineering graduates with any level of appropriate experience, so that we can give you the opportunity to use your technical skills to provide high quality input to our aerospace projects, spanning both aerostructures and aeroengines. in return, you can expect good career opportunities and the chance for advancement and personal and professional development, support while you gain chartership and some opportunities to possibly travel or work in other offices, in or outside of the uk. the requirements you will need to have a good engineering degree that includes structural analysis (such as aeronautical, mechanical, automotive, civil) with some experience in a professional engineering environment relevant to (but not limited to) the aerospace sector. you will need to demonstrate experience in at least one or more of the following areas: structural/stress analysis composite stress analysis (any industry) linear and nonlinear finite element analysis fatigue and damage tolerance structural dynamics thermal analysis aerostructures experience you will also be expected to demonstrate the following qualities: a strong desire to progress quickly to a position of leadership professional approach strong communication skills, written and verbal commercial awareness team working, being comfortable working in international teams and self managing please note security clearance is required for this role stress engineer glasgow salary **** to ****"
2,"Mathematical Modeller / Simulation Analyst / Operational Analyst Basingstoke, Hampshire Up to ****K AAE pension contribution, private medical and dental The opportunity Our client is an independent consultancy firm which has an opportunity for a Data Analyst with 35 years experience. The role will require the successful candidate to demonstrate their ability to analyse a problem and arrive at a solution, with varying levels of data being available. Essential skills Thorough knowledge of Excel and proven ability to utilise this to create powerful decision support models Experience in Modelling and Simulation Techniques, Experience of techniques such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background minimum degree qualified Proven analytical and problem solving skills Self Starter Ability to develop solid working relationships In addition to formal qualifications and experience, the successful candidate will require excellent written and verbal communication skills, be energetic, enterprising and have a determination to succeed. They will be required to build solid working relationships, both internally with colleagues and, most importantly, externally with our clients. They must be comfortable working independently to deliver against challenging client demands. The offices are located in Basingstoke, Hampshire, but our client work for clients worldwide. The successful candidate must therefore be prepared to undertake work at client sites for short periods of time. Physics, Mathematics, Modelling, Simulation, Analytical, Operational Research, Mathematical Modelling Mathematical Modeller / Simulation Analyst / Operational Analyst Basingstoke, Hampshire ****K AAE pension contribution, private medical and dental","mathematical modeller / simulation analyst / operational analyst basingstoke, hampshire up to ****k aae pension contribution, private medical and dental the opportunity our client is an independent consultancy firm which has an opportunity for a data analyst with 35 years experience. the role will require the successful candidate to demonstrate their ability to analyse a problem and arrive at a solution, with varying levels of data being available. essential skills thorough knowledge of excel and proven ability to utilise this to create powerful decision support models experience in modelling and simulation techniques, experience of techniques such as discrete event simulation and/or sd modelling mathematical/scientific background minimum degree qualified proven analytical and problem solving skills self starter ability to develop solid working relationships in addition to formal qualifications and experience, the successful candidate will require excellent written and verbal communication skills, be energetic, enterprising and have a determination to succeed. they will be required to build solid working relationships, both internally with colleagues and, most importantly, externally with our clients. they must be comfortable working independently to deliver against challenging client demands. the offices are located in basingstoke, hampshire, but our client work for clients worldwide. the successful candidate must therefore be prepared to undertake work at client sites for short periods of time. physics, mathematics, modelling, simulation, analytical, operational research, mathematical modelling mathematical modeller / simulation analyst / operational analyst basingstoke, hampshire ****k aae pension contribution, private medical and dental"
3,"Engineering Systems Analyst / Mathematical Modeller. Our client is a highly successful and respected Consultancy providing specialist software development MISER, PIONEER, Maths, Mathematical, Optimisation, Risk Analysis, Asset Management, Water Industry, Access, Excel, VBA, SQL, Systems . Engineering Systems Analyst / Mathematical Modeller. Salary ****K****K negotiable Location Dorking, Surrey","engineering systems analyst / mathematical modeller. our client is a highly successful and respected consultancy providing specialist software development miser, pioneer, maths, mathematical, optimisation, risk analysis, asset management, water industry, access, excel, vba, sql, systems . engineering systems analyst / mathematical modeller. salary ****k****k negotiable location dorking, surrey"
4,"Pioneer, Miser Engineering Systems Analyst Dorking Surrey Salary ****K Located in Surrey, our client provides specialist software development Pioneer, Miser Engineering Systems Analyst Dorking Surrey Salary ****K","pioneer, miser engineering systems analyst dorking surrey salary ****k located in surrey, our client provides specialist software development pioneer, miser engineering systems analyst dorking surrey salary ****k"


### Removendo pontuações:

In [14]:
df_FullDescription["processed_FullDescription"] = df_FullDescription["processed_FullDescription"].str.replace('[^\w\s]',' ', regex=True)
df_FullDescription.head()

Unnamed: 0,FullDescription,processed_FullDescription
0,"Engineering Systems Analyst Dorking Surrey Salary ****K Our client is located in Dorking, Surrey and are looking for Engineering Systems Analyst our client provides specialist software development Keywords Mathematical Modelling, Risk Analysis, System Modelling, Optimisation, MISER, PIONEEER Engineering Systems Analyst Dorking Surrey Salary ****K",engineering systems analyst dorking surrey salary k our client is located in dorking surrey and are looking for engineering systems analyst our client provides specialist software development keywords mathematical modelling risk analysis system modelling optimisation miser pioneeer engineering systems analyst dorking surrey salary k
1,"Stress Engineer Glasgow Salary **** to **** We re currently looking for talented engineers to join our growing Glasgow team at a variety of levels. The roles are ideally suited to high calibre engineering graduates with any level of appropriate experience, so that we can give you the opportunity to use your technical skills to provide high quality input to our aerospace projects, spanning both aerostructures and aeroengines. In return, you can expect good career opportunities and the chance for advancement and personal and professional development, support while you gain Chartership and some opportunities to possibly travel or work in other offices, in or outside of the UK. The Requirements You will need to have a good engineering degree that includes structural analysis (such as aeronautical, mechanical, automotive, civil) with some experience in a professional engineering environment relevant to (but not limited to) the aerospace sector. You will need to demonstrate experience in at least one or more of the following areas: Structural/stress analysis Composite stress analysis (any industry) Linear and nonlinear finite element analysis Fatigue and damage tolerance Structural dynamics Thermal analysis Aerostructures experience You will also be expected to demonstrate the following qualities: A strong desire to progress quickly to a position of leadership Professional approach Strong communication skills, written and verbal Commercial awareness Team working, being comfortable working in international teams and self managing PLEASE NOTE SECURITY CLEARANCE IS REQUIRED FOR THIS ROLE Stress Engineer Glasgow Salary **** to ****",stress engineer glasgow salary to we re currently looking for talented engineers to join our growing glasgow team at a variety of levels the roles are ideally suited to high calibre engineering graduates with any level of appropriate experience so that we can give you the opportunity to use your technical skills to provide high quality input to our aerospace projects spanning both aerostructures and aeroengines in return you can expect good career opportunities and the chance for advancement and personal and professional development support while you gain chartership and some opportunities to possibly travel or work in other offices in or outside of the uk the requirements you will need to have a good engineering degree that includes structural analysis such as aeronautical mechanical automotive civil with some experience in a professional engineering environment relevant to but not limited to the aerospace sector you will need to demonstrate experience in at least one or more of the following areas structural stress analysis composite stress analysis any industry linear and nonlinear finite element analysis fatigue and damage tolerance structural dynamics thermal analysis aerostructures experience you will also be expected to demonstrate the following qualities a strong desire to progress quickly to a position of leadership professional approach strong communication skills written and verbal commercial awareness team working being comfortable working in international teams and self managing please note security clearance is required for this role stress engineer glasgow salary to
2,"Mathematical Modeller / Simulation Analyst / Operational Analyst Basingstoke, Hampshire Up to ****K AAE pension contribution, private medical and dental The opportunity Our client is an independent consultancy firm which has an opportunity for a Data Analyst with 35 years experience. The role will require the successful candidate to demonstrate their ability to analyse a problem and arrive at a solution, with varying levels of data being available. Essential skills Thorough knowledge of Excel and proven ability to utilise this to create powerful decision support models Experience in Modelling and Simulation Techniques, Experience of techniques such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background minimum degree qualified Proven analytical and problem solving skills Self Starter Ability to develop solid working relationships In addition to formal qualifications and experience, the successful candidate will require excellent written and verbal communication skills, be energetic, enterprising and have a determination to succeed. They will be required to build solid working relationships, both internally with colleagues and, most importantly, externally with our clients. They must be comfortable working independently to deliver against challenging client demands. The offices are located in Basingstoke, Hampshire, but our client work for clients worldwide. The successful candidate must therefore be prepared to undertake work at client sites for short periods of time. Physics, Mathematics, Modelling, Simulation, Analytical, Operational Research, Mathematical Modelling Mathematical Modeller / Simulation Analyst / Operational Analyst Basingstoke, Hampshire ****K AAE pension contribution, private medical and dental",mathematical modeller simulation analyst operational analyst basingstoke hampshire up to k aae pension contribution private medical and dental the opportunity our client is an independent consultancy firm which has an opportunity for a data analyst with 35 years experience the role will require the successful candidate to demonstrate their ability to analyse a problem and arrive at a solution with varying levels of data being available essential skills thorough knowledge of excel and proven ability to utilise this to create powerful decision support models experience in modelling and simulation techniques experience of techniques such as discrete event simulation and or sd modelling mathematical scientific background minimum degree qualified proven analytical and problem solving skills self starter ability to develop solid working relationships in addition to formal qualifications and experience the successful candidate will require excellent written and verbal communication skills be energetic enterprising and have a determination to succeed they will be required to build solid working relationships both internally with colleagues and most importantly externally with our clients they must be comfortable working independently to deliver against challenging client demands the offices are located in basingstoke hampshire but our client work for clients worldwide the successful candidate must therefore be prepared to undertake work at client sites for short periods of time physics mathematics modelling simulation analytical operational research mathematical modelling mathematical modeller simulation analyst operational analyst basingstoke hampshire k aae pension contribution private medical and dental
3,"Engineering Systems Analyst / Mathematical Modeller. Our client is a highly successful and respected Consultancy providing specialist software development MISER, PIONEER, Maths, Mathematical, Optimisation, Risk Analysis, Asset Management, Water Industry, Access, Excel, VBA, SQL, Systems . Engineering Systems Analyst / Mathematical Modeller. Salary ****K****K negotiable Location Dorking, Surrey",engineering systems analyst mathematical modeller our client is a highly successful and respected consultancy providing specialist software development miser pioneer maths mathematical optimisation risk analysis asset management water industry access excel vba sql systems engineering systems analyst mathematical modeller salary k k negotiable location dorking surrey
4,"Pioneer, Miser Engineering Systems Analyst Dorking Surrey Salary ****K Located in Surrey, our client provides specialist software development Pioneer, Miser Engineering Systems Analyst Dorking Surrey Salary ****K",pioneer miser engineering systems analyst dorking surrey salary k located in surrey our client provides specialist software development pioneer miser engineering systems analyst dorking surrey salary k


### Aplicacando Stopword Removal (Remoção de palavras irrelevantes):

In [15]:
from nltk.corpus import stopwords
", ".join(stopwords.words('english'))

"i, me, my, myself, we, our, ours, ourselves, you, you're, you've, you'll, you'd, your, yours, yourself, yourselves, he, him, his, himself, she, she's, her, hers, herself, it, it's, its, itself, they, them, their, theirs, themselves, what, which, who, whom, this, that, that'll, these, those, am, is, are, was, were, be, been, being, have, has, had, having, do, does, did, doing, a, an, the, and, but, if, or, because, as, until, while, of, at, by, for, with, about, against, between, into, through, during, before, after, above, below, to, from, up, down, in, out, on, off, over, under, again, further, then, once, here, there, when, where, why, how, all, any, both, each, few, more, most, other, some, such, no, nor, not, only, own, same, so, than, too, very, s, t, can, will, just, don, don't, should, should've, now, d, ll, m, o, re, ve, y, ain, aren, aren't, couldn, couldn't, didn, didn't, doesn, doesn't, hadn, hadn't, hasn, hasn't, haven, haven't, isn, isn't, ma, mightn, mightn't, mustn, mus

In [16]:
STOPWORDS = set(stopwords.words('english'))
def remove_stopwords(text):
  return " ".join([word for word in str(text).split() if word not in STOPWORDS])

df_FullDescription["processed_FullDescription"] = df_FullDescription["processed_FullDescription"].apply(lambda text: remove_stopwords(text))
df_FullDescription.head()

Unnamed: 0,FullDescription,processed_FullDescription
0,"Engineering Systems Analyst Dorking Surrey Salary ****K Our client is located in Dorking, Surrey and are looking for Engineering Systems Analyst our client provides specialist software development Keywords Mathematical Modelling, Risk Analysis, System Modelling, Optimisation, MISER, PIONEEER Engineering Systems Analyst Dorking Surrey Salary ****K",engineering systems analyst dorking surrey salary k client located dorking surrey looking engineering systems analyst client provides specialist software development keywords mathematical modelling risk analysis system modelling optimisation miser pioneeer engineering systems analyst dorking surrey salary k
1,"Stress Engineer Glasgow Salary **** to **** We re currently looking for talented engineers to join our growing Glasgow team at a variety of levels. The roles are ideally suited to high calibre engineering graduates with any level of appropriate experience, so that we can give you the opportunity to use your technical skills to provide high quality input to our aerospace projects, spanning both aerostructures and aeroengines. In return, you can expect good career opportunities and the chance for advancement and personal and professional development, support while you gain Chartership and some opportunities to possibly travel or work in other offices, in or outside of the UK. The Requirements You will need to have a good engineering degree that includes structural analysis (such as aeronautical, mechanical, automotive, civil) with some experience in a professional engineering environment relevant to (but not limited to) the aerospace sector. You will need to demonstrate experience in at least one or more of the following areas: Structural/stress analysis Composite stress analysis (any industry) Linear and nonlinear finite element analysis Fatigue and damage tolerance Structural dynamics Thermal analysis Aerostructures experience You will also be expected to demonstrate the following qualities: A strong desire to progress quickly to a position of leadership Professional approach Strong communication skills, written and verbal Commercial awareness Team working, being comfortable working in international teams and self managing PLEASE NOTE SECURITY CLEARANCE IS REQUIRED FOR THIS ROLE Stress Engineer Glasgow Salary **** to ****",stress engineer glasgow salary currently looking talented engineers join growing glasgow team variety levels roles ideally suited high calibre engineering graduates level appropriate experience give opportunity use technical skills provide high quality input aerospace projects spanning aerostructures aeroengines return expect good career opportunities chance advancement personal professional development support gain chartership opportunities possibly travel work offices outside uk requirements need good engineering degree includes structural analysis aeronautical mechanical automotive civil experience professional engineering environment relevant limited aerospace sector need demonstrate experience least one following areas structural stress analysis composite stress analysis industry linear nonlinear finite element analysis fatigue damage tolerance structural dynamics thermal analysis aerostructures experience also expected demonstrate following qualities strong desire progress quickly position leadership professional approach strong communication skills written verbal commercial awareness team working comfortable working international teams self managing please note security clearance required role stress engineer glasgow salary
2,"Mathematical Modeller / Simulation Analyst / Operational Analyst Basingstoke, Hampshire Up to ****K AAE pension contribution, private medical and dental The opportunity Our client is an independent consultancy firm which has an opportunity for a Data Analyst with 35 years experience. The role will require the successful candidate to demonstrate their ability to analyse a problem and arrive at a solution, with varying levels of data being available. Essential skills Thorough knowledge of Excel and proven ability to utilise this to create powerful decision support models Experience in Modelling and Simulation Techniques, Experience of techniques such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background minimum degree qualified Proven analytical and problem solving skills Self Starter Ability to develop solid working relationships In addition to formal qualifications and experience, the successful candidate will require excellent written and verbal communication skills, be energetic, enterprising and have a determination to succeed. They will be required to build solid working relationships, both internally with colleagues and, most importantly, externally with our clients. They must be comfortable working independently to deliver against challenging client demands. The offices are located in Basingstoke, Hampshire, but our client work for clients worldwide. The successful candidate must therefore be prepared to undertake work at client sites for short periods of time. Physics, Mathematics, Modelling, Simulation, Analytical, Operational Research, Mathematical Modelling Mathematical Modeller / Simulation Analyst / Operational Analyst Basingstoke, Hampshire ****K AAE pension contribution, private medical and dental",mathematical modeller simulation analyst operational analyst basingstoke hampshire k aae pension contribution private medical dental opportunity client independent consultancy firm opportunity data analyst 35 years experience role require successful candidate demonstrate ability analyse problem arrive solution varying levels data available essential skills thorough knowledge excel proven ability utilise create powerful decision support models experience modelling simulation techniques experience techniques discrete event simulation sd modelling mathematical scientific background minimum degree qualified proven analytical problem solving skills self starter ability develop solid working relationships addition formal qualifications experience successful candidate require excellent written verbal communication skills energetic enterprising determination succeed required build solid working relationships internally colleagues importantly externally clients must comfortable working independently deliver challenging client demands offices located basingstoke hampshire client work clients worldwide successful candidate must therefore prepared undertake work client sites short periods time physics mathematics modelling simulation analytical operational research mathematical modelling mathematical modeller simulation analyst operational analyst basingstoke hampshire k aae pension contribution private medical dental
3,"Engineering Systems Analyst / Mathematical Modeller. Our client is a highly successful and respected Consultancy providing specialist software development MISER, PIONEER, Maths, Mathematical, Optimisation, Risk Analysis, Asset Management, Water Industry, Access, Excel, VBA, SQL, Systems . Engineering Systems Analyst / Mathematical Modeller. Salary ****K****K negotiable Location Dorking, Surrey",engineering systems analyst mathematical modeller client highly successful respected consultancy providing specialist software development miser pioneer maths mathematical optimisation risk analysis asset management water industry access excel vba sql systems engineering systems analyst mathematical modeller salary k k negotiable location dorking surrey
4,"Pioneer, Miser Engineering Systems Analyst Dorking Surrey Salary ****K Located in Surrey, our client provides specialist software development Pioneer, Miser Engineering Systems Analyst Dorking Surrey Salary ****K",pioneer miser engineering systems analyst dorking surrey salary k located surrey client provides specialist software development pioneer miser engineering systems analyst dorking surrey salary k


### Removendo palavras mais frequentes:

In [18]:
from collections import Counter
cnt_fulldescription = Counter() # Instance
for text in df_FullDescription["processed_FullDescription"].values:
  for word in text.split():
    cnt_fulldescription[word] += 1

cnt_fulldescription.most_common(10)

[('experience', 428042),
 ('role', 292124),
 ('work', 279778),
 ('team', 268138),
 ('business', 265987),
 ('skills', 235943),
 ('working', 222664),
 ('within', 217339),
 ('sales', 209317),
 ('client', 197545)]

**NOTE:**  
Na minha opinião quase todas, senão todas são relevantes para o modelo aprender. Sabendo disso não vou remover nenhuma delas.

### Remoção de palavras raras:

In [19]:
n_rare_words = 10
RAREWORDS = set([w for (w, wc) in cnt_fulldescription.most_common()[:-n_rare_words-1:-1]])
RAREWORDS

{'gpled',
 'grzedamcarthur',
 'grzedamcarthurnhs',
 'immedaitley',
 'lowehays',
 'organsations',
 'ruabon',
 'stephanietraveltraderecruitmnt',
 'swuk',
 'tne'}

In [20]:
n_rare_words = 10
RAREWORDS = set([w for (w, wc) in cnt_fulldescription.most_common()[:-n_rare_words-1:-1]])
def remove_rarewords(text):
  return " ".join([word for word in str(text).split() if word not in RAREWORDS])

df_FullDescription["processed_FullDescription"] = df_FullDescription["processed_FullDescription"].apply(lambda text: remove_rarewords(text))
df_FullDescription.head()

Unnamed: 0,FullDescription,processed_FullDescription
0,"Engineering Systems Analyst Dorking Surrey Salary ****K Our client is located in Dorking, Surrey and are looking for Engineering Systems Analyst our client provides specialist software development Keywords Mathematical Modelling, Risk Analysis, System Modelling, Optimisation, MISER, PIONEEER Engineering Systems Analyst Dorking Surrey Salary ****K",engineering systems analyst dorking surrey salary k client located dorking surrey looking engineering systems analyst client provides specialist software development keywords mathematical modelling risk analysis system modelling optimisation miser pioneeer engineering systems analyst dorking surrey salary k
1,"Stress Engineer Glasgow Salary **** to **** We re currently looking for talented engineers to join our growing Glasgow team at a variety of levels. The roles are ideally suited to high calibre engineering graduates with any level of appropriate experience, so that we can give you the opportunity to use your technical skills to provide high quality input to our aerospace projects, spanning both aerostructures and aeroengines. In return, you can expect good career opportunities and the chance for advancement and personal and professional development, support while you gain Chartership and some opportunities to possibly travel or work in other offices, in or outside of the UK. The Requirements You will need to have a good engineering degree that includes structural analysis (such as aeronautical, mechanical, automotive, civil) with some experience in a professional engineering environment relevant to (but not limited to) the aerospace sector. You will need to demonstrate experience in at least one or more of the following areas: Structural/stress analysis Composite stress analysis (any industry) Linear and nonlinear finite element analysis Fatigue and damage tolerance Structural dynamics Thermal analysis Aerostructures experience You will also be expected to demonstrate the following qualities: A strong desire to progress quickly to a position of leadership Professional approach Strong communication skills, written and verbal Commercial awareness Team working, being comfortable working in international teams and self managing PLEASE NOTE SECURITY CLEARANCE IS REQUIRED FOR THIS ROLE Stress Engineer Glasgow Salary **** to ****",stress engineer glasgow salary currently looking talented engineers join growing glasgow team variety levels roles ideally suited high calibre engineering graduates level appropriate experience give opportunity use technical skills provide high quality input aerospace projects spanning aerostructures aeroengines return expect good career opportunities chance advancement personal professional development support gain chartership opportunities possibly travel work offices outside uk requirements need good engineering degree includes structural analysis aeronautical mechanical automotive civil experience professional engineering environment relevant limited aerospace sector need demonstrate experience least one following areas structural stress analysis composite stress analysis industry linear nonlinear finite element analysis fatigue damage tolerance structural dynamics thermal analysis aerostructures experience also expected demonstrate following qualities strong desire progress quickly position leadership professional approach strong communication skills written verbal commercial awareness team working comfortable working international teams self managing please note security clearance required role stress engineer glasgow salary
2,"Mathematical Modeller / Simulation Analyst / Operational Analyst Basingstoke, Hampshire Up to ****K AAE pension contribution, private medical and dental The opportunity Our client is an independent consultancy firm which has an opportunity for a Data Analyst with 35 years experience. The role will require the successful candidate to demonstrate their ability to analyse a problem and arrive at a solution, with varying levels of data being available. Essential skills Thorough knowledge of Excel and proven ability to utilise this to create powerful decision support models Experience in Modelling and Simulation Techniques, Experience of techniques such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background minimum degree qualified Proven analytical and problem solving skills Self Starter Ability to develop solid working relationships In addition to formal qualifications and experience, the successful candidate will require excellent written and verbal communication skills, be energetic, enterprising and have a determination to succeed. They will be required to build solid working relationships, both internally with colleagues and, most importantly, externally with our clients. They must be comfortable working independently to deliver against challenging client demands. The offices are located in Basingstoke, Hampshire, but our client work for clients worldwide. The successful candidate must therefore be prepared to undertake work at client sites for short periods of time. Physics, Mathematics, Modelling, Simulation, Analytical, Operational Research, Mathematical Modelling Mathematical Modeller / Simulation Analyst / Operational Analyst Basingstoke, Hampshire ****K AAE pension contribution, private medical and dental",mathematical modeller simulation analyst operational analyst basingstoke hampshire k aae pension contribution private medical dental opportunity client independent consultancy firm opportunity data analyst 35 years experience role require successful candidate demonstrate ability analyse problem arrive solution varying levels data available essential skills thorough knowledge excel proven ability utilise create powerful decision support models experience modelling simulation techniques experience techniques discrete event simulation sd modelling mathematical scientific background minimum degree qualified proven analytical problem solving skills self starter ability develop solid working relationships addition formal qualifications experience successful candidate require excellent written verbal communication skills energetic enterprising determination succeed required build solid working relationships internally colleagues importantly externally clients must comfortable working independently deliver challenging client demands offices located basingstoke hampshire client work clients worldwide successful candidate must therefore prepared undertake work client sites short periods time physics mathematics modelling simulation analytical operational research mathematical modelling mathematical modeller simulation analyst operational analyst basingstoke hampshire k aae pension contribution private medical dental
3,"Engineering Systems Analyst / Mathematical Modeller. Our client is a highly successful and respected Consultancy providing specialist software development MISER, PIONEER, Maths, Mathematical, Optimisation, Risk Analysis, Asset Management, Water Industry, Access, Excel, VBA, SQL, Systems . Engineering Systems Analyst / Mathematical Modeller. Salary ****K****K negotiable Location Dorking, Surrey",engineering systems analyst mathematical modeller client highly successful respected consultancy providing specialist software development miser pioneer maths mathematical optimisation risk analysis asset management water industry access excel vba sql systems engineering systems analyst mathematical modeller salary k k negotiable location dorking surrey
4,"Pioneer, Miser Engineering Systems Analyst Dorking Surrey Salary ****K Located in Surrey, our client provides specialist software development Pioneer, Miser Engineering Systems Analyst Dorking Surrey Salary ****K",pioneer miser engineering systems analyst dorking surrey salary k located surrey client provides specialist software development pioneer miser engineering systems analyst dorking surrey salary k


### Aplica a técnica de Stemming:

In [22]:
from nltk.stem.porter import PorterStemmer

stemmer = PorterStemmer() # Instance.
def stem_words(text):
  return " ".join([stemmer.stem(word) for word in text.split()])

df_FullDescription["processed_FullDescription"] = df_FullDescription["processed_FullDescription"].apply(lambda text: stem_words(text))
df_FullDescription.head()

Unnamed: 0,FullDescription,processed_FullDescription
0,"Engineering Systems Analyst Dorking Surrey Salary ****K Our client is located in Dorking, Surrey and are looking for Engineering Systems Analyst our client provides specialist software development Keywords Mathematical Modelling, Risk Analysis, System Modelling, Optimisation, MISER, PIONEEER Engineering Systems Analyst Dorking Surrey Salary ****K",engin system analyst dork surrey salari k client locat dork surrey look engin system analyst client provid specialist softwar develop keyword mathemat model risk analysi system model optimis miser pioneeer engin system analyst dork surrey salari k
1,"Stress Engineer Glasgow Salary **** to **** We re currently looking for talented engineers to join our growing Glasgow team at a variety of levels. The roles are ideally suited to high calibre engineering graduates with any level of appropriate experience, so that we can give you the opportunity to use your technical skills to provide high quality input to our aerospace projects, spanning both aerostructures and aeroengines. In return, you can expect good career opportunities and the chance for advancement and personal and professional development, support while you gain Chartership and some opportunities to possibly travel or work in other offices, in or outside of the UK. The Requirements You will need to have a good engineering degree that includes structural analysis (such as aeronautical, mechanical, automotive, civil) with some experience in a professional engineering environment relevant to (but not limited to) the aerospace sector. You will need to demonstrate experience in at least one or more of the following areas: Structural/stress analysis Composite stress analysis (any industry) Linear and nonlinear finite element analysis Fatigue and damage tolerance Structural dynamics Thermal analysis Aerostructures experience You will also be expected to demonstrate the following qualities: A strong desire to progress quickly to a position of leadership Professional approach Strong communication skills, written and verbal Commercial awareness Team working, being comfortable working in international teams and self managing PLEASE NOTE SECURITY CLEARANCE IS REQUIRED FOR THIS ROLE Stress Engineer Glasgow Salary **** to ****",stress engin glasgow salari current look talent engin join grow glasgow team varieti level role ideal suit high calibr engin graduat level appropri experi give opportun use technic skill provid high qualiti input aerospac project span aerostructur aeroengin return expect good career opportun chanc advanc person profession develop support gain chartership opportun possibl travel work offic outsid uk requir need good engin degre includ structur analysi aeronaut mechan automot civil experi profession engin environ relev limit aerospac sector need demonstr experi least one follow area structur stress analysi composit stress analysi industri linear nonlinear finit element analysi fatigu damag toler structur dynam thermal analysi aerostructur experi also expect demonstr follow qualiti strong desir progress quickli posit leadership profession approach strong commun skill written verbal commerci awar team work comfort work intern team self manag pleas note secur clearanc requir role stress engin glasgow salari
2,"Mathematical Modeller / Simulation Analyst / Operational Analyst Basingstoke, Hampshire Up to ****K AAE pension contribution, private medical and dental The opportunity Our client is an independent consultancy firm which has an opportunity for a Data Analyst with 35 years experience. The role will require the successful candidate to demonstrate their ability to analyse a problem and arrive at a solution, with varying levels of data being available. Essential skills Thorough knowledge of Excel and proven ability to utilise this to create powerful decision support models Experience in Modelling and Simulation Techniques, Experience of techniques such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background minimum degree qualified Proven analytical and problem solving skills Self Starter Ability to develop solid working relationships In addition to formal qualifications and experience, the successful candidate will require excellent written and verbal communication skills, be energetic, enterprising and have a determination to succeed. They will be required to build solid working relationships, both internally with colleagues and, most importantly, externally with our clients. They must be comfortable working independently to deliver against challenging client demands. The offices are located in Basingstoke, Hampshire, but our client work for clients worldwide. The successful candidate must therefore be prepared to undertake work at client sites for short periods of time. Physics, Mathematics, Modelling, Simulation, Analytical, Operational Research, Mathematical Modelling Mathematical Modeller / Simulation Analyst / Operational Analyst Basingstoke, Hampshire ****K AAE pension contribution, private medical and dental",mathemat model simul analyst oper analyst basingstok hampshir k aae pension contribut privat medic dental opportun client independ consult firm opportun data analyst 35 year experi role requir success candid demonstr abil analys problem arriv solut vari level data avail essenti skill thorough knowledg excel proven abil utilis creat power decis support model experi model simul techniqu experi techniqu discret event simul sd model mathemat scientif background minimum degre qualifi proven analyt problem solv skill self starter abil develop solid work relationship addit formal qualif experi success candid requir excel written verbal commun skill energet enterpris determin succeed requir build solid work relationship intern colleagu importantli extern client must comfort work independ deliv challeng client demand offic locat basingstok hampshir client work client worldwid success candid must therefor prepar undertak work client site short period time physic mathemat model simul analyt oper research mathemat model mathemat model simul analyst oper analyst basingstok hampshir k aae pension contribut privat medic dental
3,"Engineering Systems Analyst / Mathematical Modeller. Our client is a highly successful and respected Consultancy providing specialist software development MISER, PIONEER, Maths, Mathematical, Optimisation, Risk Analysis, Asset Management, Water Industry, Access, Excel, VBA, SQL, Systems . Engineering Systems Analyst / Mathematical Modeller. Salary ****K****K negotiable Location Dorking, Surrey",engin system analyst mathemat model client highli success respect consult provid specialist softwar develop miser pioneer math mathemat optimis risk analysi asset manag water industri access excel vba sql system engin system analyst mathemat model salari k k negoti locat dork surrey
4,"Pioneer, Miser Engineering Systems Analyst Dorking Surrey Salary ****K Located in Surrey, our client provides specialist software development Pioneer, Miser Engineering Systems Analyst Dorking Surrey Salary ****K",pioneer miser engin system analyst dork surrey salari k locat surrey client provid specialist softwar develop pioneer miser engin system analyst dork surrey salari k


### Aplica a técnica de Lemmatization:

In [24]:
import nltk
nltk.download('omw-1.4')

[nltk_data] Downloading package omw-1.4 to /home/drigols/nltk_data...
[nltk_data]   Unzipping corpora/omw-1.4.zip.


True

In [25]:
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer() # Instance.
def lemmatize_words(text):
  return " ".join([lemmatizer.lemmatize(word) for word in text.split()])

df_FullDescription["processed_FullDescription"] = df_FullDescription["processed_FullDescription"].apply(lambda text: lemmatize_words(text))
df_FullDescription.head()

Unnamed: 0,FullDescription,processed_FullDescription
0,"Engineering Systems Analyst Dorking Surrey Salary ****K Our client is located in Dorking, Surrey and are looking for Engineering Systems Analyst our client provides specialist software development Keywords Mathematical Modelling, Risk Analysis, System Modelling, Optimisation, MISER, PIONEEER Engineering Systems Analyst Dorking Surrey Salary ****K",engin system analyst dork surrey salari k client locat dork surrey look engin system analyst client provid specialist softwar develop keyword mathemat model risk analysi system model optimis miser pioneeer engin system analyst dork surrey salari k
1,"Stress Engineer Glasgow Salary **** to **** We re currently looking for talented engineers to join our growing Glasgow team at a variety of levels. The roles are ideally suited to high calibre engineering graduates with any level of appropriate experience, so that we can give you the opportunity to use your technical skills to provide high quality input to our aerospace projects, spanning both aerostructures and aeroengines. In return, you can expect good career opportunities and the chance for advancement and personal and professional development, support while you gain Chartership and some opportunities to possibly travel or work in other offices, in or outside of the UK. The Requirements You will need to have a good engineering degree that includes structural analysis (such as aeronautical, mechanical, automotive, civil) with some experience in a professional engineering environment relevant to (but not limited to) the aerospace sector. You will need to demonstrate experience in at least one or more of the following areas: Structural/stress analysis Composite stress analysis (any industry) Linear and nonlinear finite element analysis Fatigue and damage tolerance Structural dynamics Thermal analysis Aerostructures experience You will also be expected to demonstrate the following qualities: A strong desire to progress quickly to a position of leadership Professional approach Strong communication skills, written and verbal Commercial awareness Team working, being comfortable working in international teams and self managing PLEASE NOTE SECURITY CLEARANCE IS REQUIRED FOR THIS ROLE Stress Engineer Glasgow Salary **** to ****",stress engin glasgow salari current look talent engin join grow glasgow team varieti level role ideal suit high calibr engin graduat level appropri experi give opportun use technic skill provid high qualiti input aerospac project span aerostructur aeroengin return expect good career opportun chanc advanc person profession develop support gain chartership opportun possibl travel work offic outsid uk requir need good engin degre includ structur analysi aeronaut mechan automot civil experi profession engin environ relev limit aerospac sector need demonstr experi least one follow area structur stress analysi composit stress analysi industri linear nonlinear finit element analysi fatigu damag toler structur dynam thermal analysi aerostructur experi also expect demonstr follow qualiti strong desir progress quickli posit leadership profession approach strong commun skill written verbal commerci awar team work comfort work intern team self manag plea note secur clearanc requir role stress engin glasgow salari
2,"Mathematical Modeller / Simulation Analyst / Operational Analyst Basingstoke, Hampshire Up to ****K AAE pension contribution, private medical and dental The opportunity Our client is an independent consultancy firm which has an opportunity for a Data Analyst with 35 years experience. The role will require the successful candidate to demonstrate their ability to analyse a problem and arrive at a solution, with varying levels of data being available. Essential skills Thorough knowledge of Excel and proven ability to utilise this to create powerful decision support models Experience in Modelling and Simulation Techniques, Experience of techniques such as Discrete Event Simulation and/or SD modelling Mathematical/scientific background minimum degree qualified Proven analytical and problem solving skills Self Starter Ability to develop solid working relationships In addition to formal qualifications and experience, the successful candidate will require excellent written and verbal communication skills, be energetic, enterprising and have a determination to succeed. They will be required to build solid working relationships, both internally with colleagues and, most importantly, externally with our clients. They must be comfortable working independently to deliver against challenging client demands. The offices are located in Basingstoke, Hampshire, but our client work for clients worldwide. The successful candidate must therefore be prepared to undertake work at client sites for short periods of time. Physics, Mathematics, Modelling, Simulation, Analytical, Operational Research, Mathematical Modelling Mathematical Modeller / Simulation Analyst / Operational Analyst Basingstoke, Hampshire ****K AAE pension contribution, private medical and dental",mathemat model simul analyst oper analyst basingstok hampshir k aae pension contribut privat medic dental opportun client independ consult firm opportun data analyst 35 year experi role requir success candid demonstr abil analys problem arriv solut vari level data avail essenti skill thorough knowledg excel proven abil utilis creat power decis support model experi model simul techniqu experi techniqu discret event simul sd model mathemat scientif background minimum degre qualifi proven analyt problem solv skill self starter abil develop solid work relationship addit formal qualif experi success candid requir excel written verbal commun skill energet enterpris determin succeed requir build solid work relationship intern colleagu importantli extern client must comfort work independ deliv challeng client demand offic locat basingstok hampshir client work client worldwid success candid must therefor prepar undertak work client site short period time physic mathemat model simul analyt oper research mathemat model mathemat model simul analyst oper analyst basingstok hampshir k aae pension contribut privat medic dental
3,"Engineering Systems Analyst / Mathematical Modeller. Our client is a highly successful and respected Consultancy providing specialist software development MISER, PIONEER, Maths, Mathematical, Optimisation, Risk Analysis, Asset Management, Water Industry, Access, Excel, VBA, SQL, Systems . Engineering Systems Analyst / Mathematical Modeller. Salary ****K****K negotiable Location Dorking, Surrey",engin system analyst mathemat model client highli success respect consult provid specialist softwar develop miser pioneer math mathemat optimis risk analysi asset manag water industri access excel vba sql system engin system analyst mathemat model salari k k negoti locat dork surrey
4,"Pioneer, Miser Engineering Systems Analyst Dorking Surrey Salary ****K Located in Surrey, our client provides specialist software development Pioneer, Miser Engineering Systems Analyst Dorking Surrey Salary ****K",pioneer miser engin system analyst dork surrey salari k locat surrey client provid specialist softwar develop pioneer miser engin system analyst dork surrey salari k


### Aplica a técnica de Count Vectorizer:

In [26]:
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer() # Instance.
df_FullDescription_vectorized = vectorizer.fit_transform(df_FullDescription["processed_FullDescription"])

### Salvando à Matriz esparsa:

In [27]:
import scipy.sparse

scipy.sparse.save_npz('df_fulldescription_vectorized.npz', df_FullDescription_vectorized)

# 04 - Load
> A etapa de **load** vai ser responsável por salvar os dados já ***Pré-Processados*** por uma ou mais colunas (features).

**NOTE:**  
Essa etapa segue uma lógica incremental, onde, em cada iteração **(Load-v1, Load-v2,..., Load-vn)** nós vamos salvando os dados já manipulados com objetivo de encontrar uma melhor métrica ou modelagem dos dados.

---

## 04.1 - Load-v1
Para esse 1° **Load** vamos começar com as colunas (features) mais simples possíveis, que são:
 - **Title** como variável **independente**.
 - **"SalaryNormalized"** como variável **dependente**.

**NOTE:**  
Eu escolhi essas colunas (features), pois, já receberam algum **Pré-Processamento básico** (o que não significa que mudanças possam ser feitas).

### Salvando a Matriz esparsa "df_title_vectorized" (resultado do Pré-Processamento da feature Title):
Primeiro, vamos salvar o resultado do **Pré-Processamento** na coluna (feature) **Title**.

In [27]:
import scipy.sparse
scipy.sparse.save_npz('df_title_vectorized.npz', df_title_vectorized)

### SalaryNormalized:
Para coluna (feature) **"SalaryNormalized"** nós vamos pegar ela na hora do treinamento do modelo visto que alterações não foram feitas.

---

## 04.2 - Load-v2

Para o 2° **Load** nós vamos passar para a etapa de **Treinamento & Validação** a coluna (feature) **FullDescription**, que foi recentemente *Pré-Processada*.

### Salvando à Matriz esparsa "df_FullDescription_vectorized" (resultado do Pré-Processamento da feature FullDescription)

In [27]:
import scipy.sparse

scipy.sparse.save_npz('df_fulldescription_vectorized.npz', df_FullDescription_vectorized)

# Resumos

 - **Load-v1:**
   - No ***Load-v1*** foi **Pré-Processada** a coluna (features) **Title**.
   - Também foi utilizada a coluna (feature) **SalaryNormalized** que já havia sido normalizada pelo a **Adzuna**.
   - O objetivo era ter **features** o mais rápido possível disponíveis para a etapa de **Modelagem & Validação**.
 - Load-v2:
   - No **Load-v2** foi **Pré-Processada** a coluna (feature) **FullDescription**.