Example Mappings of two Palo Alto log sources to ECS 1.0.0-beta2 #352

MikePaquette · 2019-03-01T12:48:57Z

The attached Excel file proposes a logical mapping of pan_traffic and pan_threat logs to ECS 1.0.0-beta2.

The Palo Alto field definitions were obtained from:

As a reminder, in ECS, an inline firewall device takes the role of "observer" as shown below:

Notes:

PAN devices can generate logs in various logging formats. This mapping is based on the Syslog Field Definitions
This mapping is not an official part of ECS, it is simply offered as an example of how a logical mapping of a commonly used security device would be performed in ECS
This example does not contain an index mapping template, nor an ingest node pipeline, nor a Logstash configuration. Creating those is left to you :-)
There are columns in this spreadsheet for "enhancement request fields" - that is, fields that are not currently defined by ECS, but would likely be useful in the future.

ECS_Mapping_Examples_PAN_1.0.0-beta2.xlsx

Questions and/or feedback about the mappings, the value of such logical mappings, and whether or not such logical mappings might be useful as part of the repo in the future, are all welcome.

webmat · 2019-03-01T16:35:16Z

Love this.

I think it would be great to have a place where we collect and curate examples with supporting files (like this) publicly. Perhaps on the main website, perhaps not.

In the meantime, I think it makes sense to use the user case GitHub label -- which we will specifically curate -- to pull them out of GitHub issues.

webmat · 2019-03-01T16:37:04Z

Looking at the spreadsheet, and the Palo Alto docs, would it be worth having a column for the programmatic field name? What's in parens on their doc is the programmatic field name, correct?

webmat · 2019-03-01T16:37:55Z

Another meta comment: having a sample log file attached here would be great :-)

davidhowell-tx · 2019-03-06T00:41:23Z

Was just looking this over, and I think the time fields need to be modified.

"Generate Time" is when the event occurred and was originally logged on the firewall that observed the event, whereas "Received Time" is the time that the event was received by the management system (i.e. Panorama, if you're using it). In my logs I'm seeing Generate Time on average 9 seconds earlier than Receive Time.

Therefore, I believe "Generate Time" should be @timestamp.

I'm unsure where to place "Receive Time". I feel like event.created would be the original @timestamp value before it's replaced by "Generate Time" so you can track the overall time from source to Elasticsearch, but it could also be helpful to know "Receive Time" to determine if the discrepancy is from source firewall to management server (Panorama) or from management server to Elasticsearch.

davidhowell-tx · 2019-03-06T01:45:47Z

In regards to the NAT fields not being covered by ECS at this time, I chose to store these values under the base network object, considering this information as part of the communication path. Here is what I ended up calling these, I'd be curious to hear what other people are doing or if anyone thinks isn't a logical place for the information.
network.nat.destination.ip
network.nat.destination.port
network.nat.source.ip
network.nat.source.port

zlammers · 2019-03-12T01:39:21Z

Timestamps:
I agree, generate time (observed time) is @timestamp. We maintain our version of event.created (e.g., 'Logstash Time') and also compute (via ruby filter in LS) a time_delta between the two - very helpful in determining both TZ issues and/or data delay. Our current schema (similar-but-different to ECS) we keep only two primary timestamps - generated vs ingested, but I've been thinking of adding two more possible; one for management systems (like Panorama, MLM, FAZ, etc) and one for a SIEM if it's inline in the data flow as well - if we add such, we'd create time_deltas for each.

I'm also uncertain where to place these additional timestamps in the ECS schmea. The best I can come up with is: observer.manager.(timestamp|ip|hostname|vendor|version|etc) . However, I'm not sold on 'manager' as one could have a different vendor - i.e., Balabit box, or a SIEM, in between acting as an aggregation point.

NAT
As for NAT, we keep it nested in source/destination -- in source.nat.ip|port, destination.nat.ip|port - same thing, different structure - this is where I think it makes the most sense myself, as it's tied to source|destination more than network in my mind.

neu5ron · 2019-03-23T15:08:29Z

adding to thread on timestamp.

For traffic logs - use time start as the @timestamp. this is when the session started - this is important because when correlating network records (flow, etc) the time of the session is most important.
Also, this holds closest to the base field of, which the description states "Date/time when the event originated.
This is the date/time extracted from the event, typically representing when the event was generated by the source."
The generated time is when the log was created on the dataplane. These timestamp could be different, thus throwing off correlation of network logs.

For all other Palo logs (Threat, Config, System) - use "Generated Time" (time_generated) as the @timestamp - there is really no other option that makes sense (receive time definitely does not make sense)

NAT

I agree with a nested field and placing under source/destination. I think using network.nat doesn't allow, whoever is using the data, to make that clear distinction that the source.ip was NATed.

neu5ron · 2019-03-23T15:29:02Z

Log example

exlcuding:
csv order 46 (Packets Received (pkts_received) Number of server-to-client packets for the sessionAvailable on all models except the PA-4000 Series) and 47 (Session End Reason (session_end_reason) The reason a session terminated).

<14>Feb 06 07:04:18 pa5060-mainpalo-fw 1,2017/11/14 07:04:18,111901000111,TRAFFIC,end,1,2017/11/14 07:04:18,1.17.1.23,2.0.0.2,0.0.0.0,0.0.0.0,Default_OUTBOUND,1,1,web-browsing,vsys2,internal-zone,external-zone,ethernet1/1.807,ethernet1/2.909,syslog-forwarding,1,134045,1,10951,80,0,0,0x1c,tcp,allow,1158,646,512,10,2017/11/14 07:04:00,15,any,1,8470167894323,0x0,US,US,1

Session Start time is the 36th (CSV).
I left out CSV/columns 46-54

dainperkins · 2019-04-03T15:43:31Z

On NAT:

[source/destination].nat is necessary I'd say (compared to just a single network nat nest)

for any given connection there could be source and destination nats at the same time (I've done both on weird vpn setups)

neu5ron · 2019-04-05T08:47:32Z

agree on NAT, especially with firewll (oaloalto) use cases.
Also, another example: I have seen Cisco ASA VPN logs, that have 5 distinct IPs... 1 src, 1 dst, 1 src nst, 1 dst nat, and 1 originating IP for the vpn session.

davidhowell-tx · 2019-04-12T14:58:32Z

adding to thread on timestamp.

For traffic logs - use time start as the @timestamp. this is when the session started - this is important because when correlating network records (flow, etc) the time of the session is most important.
Also, this holds closest to the base field of, which the description states "Date/time when the event originated.
This is the date/time extracted from the event, typically representing when the event was generated by the source."
The generated time is when the log was created on the dataplane. These timestamp could be different, thus throwing off correlation of network logs.

May want some consensus or clarification on this. Here's the way I interpret these fields.

timestamp represents when the log was generated on the source (on the data plane):

This is the date/time extracted from the event, typically representing when the event was generated by the source.

event.start records when the session started

event.start contains the date when the event started or when the activity was first observed.

event.end records when the session ended

event.end contains the date when the event ended or when the activity was last observed.

Correct me if I'm wrong, but Palo Alto generates the log for the session after the session ends?

If the session start and end time are vastly different, it's really a question of what information is most important.

Do you want the event to appear in your timeline for when the event was actually generated
Do you want it to appear when it was initiated or ended?

I think for a single event leaving GenerateTime as timestamp and using event.start and event.end may be the best, but I also agree that the session start time is also important to represent. However, in order to do so I'm more inclined to clone the event and transform a bit to use one copy with @timestamp of the session start, and another copy with timestamp as session end. I definitely have to think about this one more.

dainperkins · 2019-04-12T23:19:46Z

Palo (and at least Firepower, tho I think I recall CP and Fortinet being the same) can be configured to report a connection, iirc, at the start, the end, or both (personally I've always done both as otherwise you could have a long lived sketchy connection that doesn't get reported until some system timeout. The end record should contain the start time as well, but obviously the start connection would need to be treated differently. /d /d

…

On Fri, Apr 12, 2019 at 10:58 AM David ***@***.***> wrote: adding to thread on timestamp. For traffic logs - use time start as the @timestamp. this is when the session started - this is important because when correlating network records (flow, etc) the time of the session is most important. Also, this holds closest to the base field of, which the description states "Date/time when the event originated. This is the date/time extracted from the event, typically representing when the event was generated by the source." The generated time is when the log was created on the dataplane. These timestamp could be different, thus throwing off correlation of network logs. May want some consensus or clarification on this. Here's the way I interpret these fields. @timestamp <https://github.com/timestamp> represents when the log was generated on the source (on the data plane): This is the date/time extracted from the event, typically representing when the event was generated by the source. event.start records when the session started event.start contains the date when the event started or when the activity was first observed. event.end records when the session ended event.end contains the date when the event ended or when the activity was last observed. Correct me if I'm wrong, but Palo Alto generates the log for the session after the session ends? If the session start and end time are vastly different, it's really a question of what information is most important. Do you want the event to appear in your timeline for when the event was actually generated, do you want it to appear when it was initiated or end? I think for a single event leaving GenerateTime as @timestamp <https://github.com/timestamp> and using event.start and event.end may be the best, but I also agree that the session start time is also important to represent. However, in order to do so I'm more inclined to clone the event and transform a bit to use one copy with @timestamp <https://github.com/timestamp> of the session start, and another copy with @timestamp <https://github.com/timestamp> as session end. I definitely have to think about this one more. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#352 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AlmSydhEcGm1bMZ648kirCjVR2YGC_pcks5vgJ8lgaJpZM4bY8ia> .

webmat · 2019-04-17T18:35:03Z

A clarification on the timestamps.

So in cases like this, where an event describes a longer lived phenomenon like a network flow, @timestamp really is more metadata about when the event was first published / logged. I think it corresponds to "Generate Time" here. This is unlikely to change, as it's the definition that makes most sense across all of the very different kinds of data sources.

Then the start/end of the flow, if you have both (or one timestamp + a duration to compute the other timestamp) go to event.start and event.end.

I find event.created a bit of a misnomer, as it's another bit of metadata that's meant to hold the first time this event was seen by one's monitoring pipeline. I don't like it because .created also sounds like "Generate Time". Another reason I don't like it is that it becomes moot when an analyst wants to add meta-data at multiple steps in their pipeline (times and errors). So for now event.created is the place where you put the time at which your first agent in line received the event. Nothing more.

Visualizing long lived events like flows ordered by start time / end time can indeed be tricky. But you can sort searches based on any timestamp you have, and you can do time histograms based on any timestamp you have a well. But this means all of the searches / visualizations have to be adjusted to use event.start (as an example). There's currently no easy way to toggle a dashboard from one timestamp field to another.

andrewthad · 2019-11-01T15:24:14Z

Thanks for this reference material. It has been helpful. I just wanted to highlight three of the suggestions for additional fields made in the document:

source.zone and destination.zone
url.category
observer.ruleset

I don't particularly care what they are named (the names with zone in them seem pretty good though), but these fields are important for our SOC engineers. Lots of queries use these fields. It would be nice if they could be included in a future version of ECS.

MikePaquette added discuss use case labels Mar 1, 2019

MikePaquette mentioned this issue Mar 1, 2019

[Filebeat] Module for Palo Alto Logs elastic/beats#9199

Closed

enotspe mentioned this issue Jul 2, 2019

Translating FortiOS log fields to ECS #492

Closed

MikePaquette mentioned this issue Jul 29, 2019

Client fields documentation is a bit confusing #506

Closed

webmat added the mapping Mappings from various sources to ECS label Sep 4, 2019

jamiehynds closed this as completed Oct 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example Mappings of two Palo Alto log sources to ECS 1.0.0-beta2 #352

Example Mappings of two Palo Alto log sources to ECS 1.0.0-beta2 #352

MikePaquette commented Mar 1, 2019

webmat commented Mar 1, 2019 •

edited

webmat commented Mar 1, 2019

webmat commented Mar 1, 2019

davidhowell-tx commented Mar 6, 2019

davidhowell-tx commented Mar 6, 2019

zlammers commented Mar 12, 2019 •

edited

neu5ron commented Mar 23, 2019

neu5ron commented Mar 23, 2019 •

edited

dainperkins commented Apr 3, 2019

neu5ron commented Apr 5, 2019 •

edited

davidhowell-tx commented Apr 12, 2019 •

edited

adding to thread on timestamp.

dainperkins commented Apr 12, 2019 via email

webmat commented Apr 17, 2019

andrewthad commented Nov 1, 2019

Example Mappings of two Palo Alto log sources to ECS 1.0.0-beta2 #352

Example Mappings of two Palo Alto log sources to ECS 1.0.0-beta2 #352

Comments

MikePaquette commented Mar 1, 2019

webmat commented Mar 1, 2019 • edited

webmat commented Mar 1, 2019

webmat commented Mar 1, 2019

davidhowell-tx commented Mar 6, 2019

davidhowell-tx commented Mar 6, 2019

zlammers commented Mar 12, 2019 • edited

neu5ron commented Mar 23, 2019

adding to thread on timestamp.

NAT

neu5ron commented Mar 23, 2019 • edited

Log example

dainperkins commented Apr 3, 2019

neu5ron commented Apr 5, 2019 • edited

davidhowell-tx commented Apr 12, 2019 • edited

adding to thread on timestamp.

timestamp represents when the log was generated on the source (on the data plane):

event.start records when the session started

event.end records when the session ended

dainperkins commented Apr 12, 2019 via email

webmat commented Apr 17, 2019

andrewthad commented Nov 1, 2019

webmat commented Mar 1, 2019 •

edited

zlammers commented Mar 12, 2019 •

edited

neu5ron commented Mar 23, 2019 •

edited

neu5ron commented Apr 5, 2019 •

edited

davidhowell-tx commented Apr 12, 2019 •

edited