-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Additional Sample Data Sets #21604
Comments
I like the idea of the ecommerce one - it would be really cool if we could get some geo data in there. |
If we can ensure the ecommerce one matches the structure we use in our current ecommerce data, we can reuse the canvas workpads design have been helping with. They are very polished and will be maintained moving forward. Agreed also re examples repo - this would move us to a more maintainable state. |
I agree with @alexfrancoeur about limiting the effort to 3 for now. I would vote for eCommerce as 2nd (first being flights). With Canvas in 6.5, huge ++ for shipping with prebuilt Canvas workpads on these dataset. If possible I would bake an anomaly or 3 in the datasets. This will make ML docs and tutorials super simple. |
@gingerwizard @jamiesmith Regarding the ecommerce data set, this was something I threw together for our Business Analytics instruction set. I'm not familiar enough with the current ecommerce data set but would gladly replace the one I came up with. The flights data set is around ~14k docs and I think 6 weeks worth of data. I was hoping to do something similar for other data sets. Is there any way we can get a snippet of that data set in a JSON new line delimitated format? If we have the mappings and saved objects (including Canvas workpad) exported, we could easily add a new PR for a new sample data set.
@asawariS do you think we'd want to use the actual module dashboards? Like literally re-use the Apache or Nginx data / module? Or introduce a more custom one. The only downside to re-using a beat dashboard is that we'd have to maintain it and make sure the dashboards are in sync. Any preference on data set? Could we borrow on from the examples repo?
++ At least for the flight sample data and my ecommerce sample data, these are controlled by a script. If we can define the anomalies we want, we can easily create them. If we're baking the anomalies in, it'd be great to bake the ML jobs in itself. I don't believe they use the saved object service today though, so it may not be possible. Worth looking into though. |
@alexfrancoeur I have included a sample doc and workpad below [1]. I believe all of it is synthetic data except the manufacturer, sku, product name and prices. I think product name is generic enough that it can stay in. SKU and Manufacturer needs to be scrambled (since the product catalog comes from Zalando, and I don't think we have rights to distribute). My guess is that it will easier to stick with your toy dataset and focus on improving dashboards using themes from the cyclops demo. @jamiesmith can you action this and respond on this thread. Loop in @EthanStrider as needed since he did a bunch of work on the Canvas workpad for the business analytics demo.
@alexfrancoeur I was thinking 3-fold. Include the default dashboard that ships with modules. Include a customized one (with a note that says so, and also references the default, as well as the add data instructions), and 3. Canvas workpad because not many people expect presentation style dashboards for infra data - so it goes above and beyond. We have an apache one in the examples repo that could be borrowed, but there may be GDPR constraints for in product use. But, if we decide to go with Apache, we have a good IP hashing script that can do the trick. [1]
|
I set up a quick call to chat about this tomorrow. I have a greatly cut down set of data, but we might actually want more than just the bare bones |
Things we would need to do to use that data:
Ethan is using: category, customer_gender, order_date, taxless_total_price |
Quick update from Jamie and I
|
Started a PR for logs sample data #22276 |
@asawariS @jamiesmith first pass at a flight sample data workpad. Will add other pages but I'll probably use this as part of my Elastic{ON} tour presentation. Logs up next |
sooooooo good @alexfrancoeur. I will come back with more questions/comments, if any, later. quick question: where are the visual assets from? Assuming we have the rights to the images. |
@asawariS we do not, at least with not acknowledging the artist(s) somewhere. That's something we need to talk about. I'm building out prototypes but I think we'll need input from design for some of these assets if we're going to package them with Kibana. |
Alex, take a look at the blog doc. There is an image section that talks about getting appropriately licensed artwork. |
@alexfrancoeur for this we can consider getting our own design team to create something for this purpose.We have done that for past Canvas projects. We share rough mockups / concepts, and design builds something to align with the ask. Examples: |
@alexfrancoeur that's excellent!! |
Logs are in (sans workpad), eCommerce next |
@asawariS I'll open some Design issues shortly |
Opened the following issues to track these sample data workpads. Design asset issues coming soon. [Sample Data] Add Canvas Workpad for Flight Data #22891 Goal is to submit a PR for eCommerce sample data this week. |
Closing. Web logs and eCommerce data sets have been merged. Sample data dashboards are in separate issues linked above. |
With the new splash screen (#21353, #18828), we will be surfacing sample data as one of the first things you see when you enter Kibana in 6.5. In order for this feature to be meaningful, it would be better to have more than one sample data set.
These could be more "fun" and generic sample data sets like Flights or use case driven such as logging, metrics, etc. I think a total of 3 might be a good start here. While these data sets are small, we should still be cognizant of what we're adding to Kibana.
I know we have discussed waiting to see adoption of Flights before adding more data sets, but even with the more recent changes to the home page (#20953), sample data is still a bit hidden. We also will not have sample data telemetry in 6.4 (#19319). This splash screen will make sample data much more prominent and is important for Cloud trial users / adoption of the stack. It would also make sense to align with demo.elastic.co needs in order to avoid duplicate work.
Here are some (brief) initial thoughts, but I'd like to discuss what options might be best within this issue.
Would love to hear any ideas
cc: @gingerwizard @asawariS @EthanStrider @AlonaNadler @jamiesmith @epixa @jimgoodwin @rayafratkina @nreese
The text was updated successfully, but these errors were encountered: