# Log analytics with Kinesis Firehose

Log analytics is a use case that allows you to analyze log data from websites, mobile devices, servers, sensors, and more for a wide variety of applications such as security event monitoring, digital marketing, application monitoring, fraud detection, ad tech, gaming, and IoT. In this lab, you will learn how to ingest and deliver Apache logs to Amazon S3 using Amazon Kinesis Data Firehose without managing any infrastructure. You can then use Amazon Athena to query log files to understand access patterns and web site performance issues.

![](https://user-images.githubusercontent.com/62965911/214810320-b27f4355-6f05-4f31-8b1c-4ef8a7b31983.png)

To create the resources, use `https://aws-streaming-artifacts.s3.amazonaws.com/firehose-immersion-day/setup-tsv-env.yaml` template or [this](./LogAnalyticsFirehose/template.yml) one.

### Send Apache access logs to Kinesis Firehose

We will use Kinesis Data Generator to send data to Kinesis Firehose. For this, use [this](./LogAnalyticsFirehose/datagen.yml) template.

Use this template in the data generator:

```
{{internet.ip}} - {{name.firstName}} [{{date.now("DD/MMM/YYYY:HH:mm:ss ZZ")}}] "{{random.weightedArrayElement({"weights":[0.6,0.1,0.1,0.2],"data":["GET","POST","DELETE","PUT"]})}} {{random.arrayElement(["/list","/wp-content","/wp-admin","/explore","/search/tag/list","/app/main/posts","/posts/posts/explore"])}}" {{random.weightedArrayElement({"weights": [0.9,0.04,0.02,0.04], "data":["200","404","500","301"]})}} {{random.number(10000)}}
```

![](https://user-images.githubusercontent.com/62965911/214810299-73b241b6-e364-4963-8b50-fca73201dc30.png)

Check in S3:

![](https://user-images.githubusercontent.com/62965911/214810306-d372b25a-cd83-44fb-9db0-700dc76c739f.png)

### Analyze data using Amazon Athena

Run this query in the query editor, replace the `<your-bucket-name>` with the name of your bucket:

```
CREATE EXTERNAL TABLE apache_logs(
  client_ip string,
  client_id string,
  user_id string,
  request_received_time string,
  client_request string,
  server_status string,
  returned_obj_size string
  )
ROW FORMAT SERDE
   'com.amazonaws.glue.serde.GrokSerDe'
WITH SERDEPROPERTIES (
   'input.format'='^%{IPV4:client_ip} %{DATA:client_id} %{USERNAME:user_id} %{GREEDYDATA:request_received_time} %{QUOTEDSTRING:client_request} %{DATA:server_status} %{DATA: returned_obj_size}$'
   )
STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
   's3://<your-bucket-name>/access-logs-tsv/';
```

Query with the following command:

```
SELECT * FROM "default"."apache_logs" limit 10;
```

![](https://user-images.githubusercontent.com/62965911/214810287-513b8d91-3229-4f01-82b1-170fd3326221.png)