Skip to content

Latest commit

 

History

History
45 lines (28 loc) · 2.33 KB

README.md

File metadata and controls

45 lines (28 loc) · 2.33 KB

Competiotion(convolve)

Convolve is a ML/AI hackathon which is organized by Cisco along with six IITs. The competition aims to spotlight some of the finest practitioners of Data Science and pit them against each other. The competition has prizes worth more than 2 Lacs! The participants are expected to be well versed with the methods of Machine Learning, Data Analytics and how real-life Industry problems are solved using them.It is open for all and consists of 3 rounds. The first two rounds will be online and the final round will be hosted offline at the campus of IIT-Guwahati. To know more about convolve click here.

Epoch 1

This is the first round of the competition in which top 100 teams have selected for the next round.All the registered team have to compete on kaggle platform.

Achievement

Our team, Enthustats stood at 57th rank across all teams registered and we have qualified for round 2.

Problem Statement

This contest is about Log Anomaly Detection. In computing, logging is the act of keeping a log of events that occur in a computer system, such as problems, errors or just information on current operations. These events may occur in the operating system or in other software. A message or log entry is recorded for each such event. Log Anomaly Detection is simply detecting anomalies in logs deposited by softwares using Machine Learning.

Dataset

I have been given a training dataset and a testing dataset containing logs generated by softwares. The training dataset is in JSON format where each key is a single software log and coresponding value is the label for that log.The labels for logs are "abnormal" and "normal". The testing dataset is in CSV format with 2 columns corespondinig to a Unique ID and a software log.

For Dataset click here

Task

My task is to train a ML model on the given training data that can predict whether a given log in testing data is an anomaly or normal.

Prerequisites

  • Random Forest Classification
  • Natural Language Processing

Libraries/Modules/Classes Imported

  • Numpy
  • Pandas
  • RandomForestClassifier
  • RandomUnderSampler
  • RandomOverSampler
  • train_test_split
  • text
  • cross_val_score
  • CountVectorizer

Kaggle Submission Score: 0.93026