Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example Mappings of two Palo Alto log sources to ECS 1.0.0-beta2 #352

Closed
MikePaquette opened this issue Mar 1, 2019 · 14 comments
Closed
Labels
discuss mapping Mappings from various sources to ECS use case

Comments

@MikePaquette
Copy link
Contributor

The attached Excel file proposes a logical mapping of pan_traffic and pan_threat logs to ECS 1.0.0-beta2.

The Palo Alto field definitions were obtained from:

As a reminder, in ECS, an inline firewall device takes the role of "observer" as shown below:
image

Notes:

  • PAN devices can generate logs in various logging formats. This mapping is based on the Syslog Field Definitions
  • This mapping is not an official part of ECS, it is simply offered as an example of how a logical mapping of a commonly used security device would be performed in ECS
  • This example does not contain an index mapping template, nor an ingest node pipeline, nor a Logstash configuration. Creating those is left to you :-)
  • There are columns in this spreadsheet for "enhancement request fields" - that is, fields that are not currently defined by ECS, but would likely be useful in the future.

ECS_Mapping_Examples_PAN_1.0.0-beta2.xlsx

Questions and/or feedback about the mappings, the value of such logical mappings, and whether or not such logical mappings might be useful as part of the repo in the future, are all welcome.

@webmat
Copy link
Contributor

webmat commented Mar 1, 2019

Love this.

I think it would be great to have a place where we collect and curate examples with supporting files (like this) publicly. Perhaps on the main website, perhaps not.

In the meantime, I think it makes sense to use the user case GitHub label -- which we will specifically curate -- to pull them out of GitHub issues.

@webmat
Copy link
Contributor

webmat commented Mar 1, 2019

Looking at the spreadsheet, and the Palo Alto docs, would it be worth having a column for the programmatic field name? What's in parens on their doc is the programmatic field name, correct?

@webmat
Copy link
Contributor

webmat commented Mar 1, 2019

Another meta comment: having a sample log file attached here would be great :-)

@davidhowell-tx
Copy link

Was just looking this over, and I think the time fields need to be modified.

"Generate Time" is when the event occurred and was originally logged on the firewall that observed the event, whereas "Received Time" is the time that the event was received by the management system (i.e. Panorama, if you're using it). In my logs I'm seeing Generate Time on average 9 seconds earlier than Receive Time.

Therefore, I believe "Generate Time" should be @timestamp.

I'm unsure where to place "Receive Time". I feel like event.created would be the original @timestamp value before it's replaced by "Generate Time" so you can track the overall time from source to Elasticsearch, but it could also be helpful to know "Receive Time" to determine if the discrepancy is from source firewall to management server (Panorama) or from management server to Elasticsearch.

@davidhowell-tx
Copy link

In regards to the NAT fields not being covered by ECS at this time, I chose to store these values under the base network object, considering this information as part of the communication path. Here is what I ended up calling these, I'd be curious to hear what other people are doing or if anyone thinks isn't a logical place for the information.
network.nat.destination.ip
network.nat.destination.port
network.nat.source.ip
network.nat.source.port

@zlammers
Copy link

zlammers commented Mar 12, 2019

Timestamps:
I agree, generate time (observed time) is @timestamp. We maintain our version of event.created (e.g., 'Logstash Time') and also compute (via ruby filter in LS) a time_delta between the two - very helpful in determining both TZ issues and/or data delay. Our current schema (similar-but-different to ECS) we keep only two primary timestamps - generated vs ingested, but I've been thinking of adding two more possible; one for management systems (like Panorama, MLM, FAZ, etc) and one for a SIEM if it's inline in the data flow as well - if we add such, we'd create time_deltas for each.

I'm also uncertain where to place these additional timestamps in the ECS schmea. The best I can come up with is: observer.manager.(timestamp|ip|hostname|vendor|version|etc) . However, I'm not sold on 'manager' as one could have a different vendor - i.e., Balabit box, or a SIEM, in between acting as an aggregation point.

NAT
As for NAT, we keep it nested in source/destination -- in source.nat.ip|port, destination.nat.ip|port - same thing, different structure - this is where I think it makes the most sense myself, as it's tied to source|destination more than network in my mind.

@neu5ron
Copy link

neu5ron commented Mar 23, 2019

adding to thread on timestamp.

For traffic logs - use time start as the @timestamp. this is when the session started - this is important because when correlating network records (flow, etc) the time of the session is most important.
Also, this holds closest to the base field of, which the description states "Date/time when the event originated.
This is the date/time extracted from the event, typically representing when the event was generated by the source."
The generated time is when the log was created on the dataplane. These timestamp could be different, thus throwing off correlation of network logs.

For all other Palo logs (Threat, Config, System) - use "Generated Time" (time_generated) as the @timestamp - there is really no other option that makes sense (receive time definitely does not make sense)

NAT

I agree with a nested field and placing under source/destination. I think using network.nat doesn't allow, whoever is using the data, to make that clear distinction that the source.ip was NATed.

@neu5ron
Copy link

neu5ron commented Mar 23, 2019

Log example

exlcuding:
csv order 46 (Packets Received (pkts_received) Number of server-to-client packets for the sessionAvailable on all models except the PA-4000 Series) and 47 (Session End Reason (session_end_reason) The reason a session terminated).

<14>Feb 06 07:04:18 pa5060-mainpalo-fw 1,2017/11/14 07:04:18,111901000111,TRAFFIC,end,1,2017/11/14 07:04:18,1.17.1.23,2.0.0.2,0.0.0.0,0.0.0.0,Default_OUTBOUND,1,1,web-browsing,vsys2,internal-zone,external-zone,ethernet1/1.807,ethernet1/2.909,syslog-forwarding,1,134045,1,10951,80,0,0,0x1c,tcp,allow,1158,646,512,10,2017/11/14 07:04:00,15,any,1,8470167894323,0x0,US,US,1

Session Start time is the 36th (CSV).
I left out CSV/columns 46-54

@dainperkins
Copy link
Contributor

On NAT:

[source/destination].nat is necessary I'd say (compared to just a single network nat nest)

for any given connection there could be source and destination nats at the same time (I've done both on weird vpn setups)

@neu5ron
Copy link

neu5ron commented Apr 5, 2019

agree on NAT, especially with firewll (oaloalto) use cases.
Also, another example: I have seen Cisco ASA VPN logs, that have 5 distinct IPs... 1 src, 1 dst, 1 src nst, 1 dst nat, and 1 originating IP for the vpn session.

@davidhowell-tx
Copy link

davidhowell-tx commented Apr 12, 2019

adding to thread on timestamp.

For traffic logs - use time start as the @timestamp. this is when the session started - this is important because when correlating network records (flow, etc) the time of the session is most important.
Also, this holds closest to the base field of, which the description states "Date/time when the event originated.
This is the date/time extracted from the event, typically representing when the event was generated by the source."
The generated time is when the log was created on the dataplane. These timestamp could be different, thus throwing off correlation of network logs.

May want some consensus or clarification on this. Here's the way I interpret these fields.

timestamp represents when the log was generated on the source (on the data plane):

This is the date/time extracted from the event, typically representing when the event was generated by the source.

event.start records when the session started

event.start contains the date when the event started or when the activity was first observed.

event.end records when the session ended

event.end contains the date when the event ended or when the activity was last observed.

Correct me if I'm wrong, but Palo Alto generates the log for the session after the session ends?

If the session start and end time are vastly different, it's really a question of what information is most important.

  • Do you want the event to appear in your timeline for when the event was actually generated

  • Do you want it to appear when it was initiated or ended?

I think for a single event leaving GenerateTime as timestamp and using event.start and event.end may be the best, but I also agree that the session start time is also important to represent. However, in order to do so I'm more inclined to clone the event and transform a bit to use one copy with @timestamp of the session start, and another copy with timestamp as session end. I definitely have to think about this one more.

@dainperkins
Copy link
Contributor

dainperkins commented Apr 12, 2019 via email

@webmat
Copy link
Contributor

webmat commented Apr 17, 2019

A clarification on the timestamps.

So in cases like this, where an event describes a longer lived phenomenon like a network flow, @timestamp really is more metadata about when the event was first published / logged. I think it corresponds to "Generate Time" here. This is unlikely to change, as it's the definition that makes most sense across all of the very different kinds of data sources.

Then the start/end of the flow, if you have both (or one timestamp + a duration to compute the other timestamp) go to event.start and event.end.

I find event.created a bit of a misnomer, as it's another bit of metadata that's meant to hold the first time this event was seen by one's monitoring pipeline. I don't like it because .created also sounds like "Generate Time". Another reason I don't like it is that it becomes moot when an analyst wants to add meta-data at multiple steps in their pipeline (times and errors). So for now event.created is the place where you put the time at which your first agent in line received the event. Nothing more.

Visualizing long lived events like flows ordered by start time / end time can indeed be tricky. But you can sort searches based on any timestamp you have, and you can do time histograms based on any timestamp you have a well. But this means all of the searches / visualizations have to be adjusted to use event.start (as an example). There's currently no easy way to toggle a dashboard from one timestamp field to another.

@andrewthad
Copy link
Contributor

Thanks for this reference material. It has been helpful. I just wanted to highlight three of the suggestions for additional fields made in the document:

  • source.zone and destination.zone
  • url.category
  • observer.ruleset

I don't particularly care what they are named (the names with zone in them seem pretty good though), but these fields are important for our SOC engineers. Lots of queries use these fields. It would be nice if they could be included in a future version of ECS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss mapping Mappings from various sources to ECS use case
Projects
None yet
Development

No branches or pull requests

8 participants