<img src="https://techcrunch.com/wp-content/uploads/2015/08/googleanalytics.jpg" alt="corona" width="700">

<br>
I thought it might be helpful to share some thoughts on best practices for challenges involving analytic reporting. My opinions come from experience being both the person reading reports and the person making reports - and also the person told to improve and resubmit his report. Of course this notebook represents a single point of view, and you are probably best served by gathering many perspectives of what "good" looks like.




# 1. Lead with the insights.

In case you read no further, I offer this idea as the one thing to make your report more impactful:

> #### Lead with the best insights from your analysis. Put the bottom line up front.

This point is important, and not natural for scientists. We like to take the audience through our line of reasoning, with open minds, supported by data and charts with a bit of explanation. Once the audience has followed along we present our conclusions.

That approach is often a good one, but here I recommend almost the opposite! Your audience consists of people looking for vital information in a limited amount of time. They'll be reading through a stack of reports looking for something that seems useful. So it's best to start with your conclusion, get the reader's interest, and give them a frame of reference from which to "hang" supporting information. It's acutally easier for someone to digest information when they know the overall storyline. Once you've set it up, you can then explain the details and show them how you got there. 

Here's the beginning of my report from the 2019 NFL Punt Analytics challenge:

<hr>

> ### Summary

> This report represents my analysis for the NFL Punt Analytics Competition. It is my opinion that based on the data provided, changing two rules for punt plays could result in up to 8 fewer concussions per year. The two proposed changes are as follows:

>  - <i>Move the ball forward 10 yards after a Fair Catch.</i> After fair catch of a scrimmage kick, play would start 10 yards forward from the spot of catch, or the succeeding spot after eforcement of penalties. This would apply when the receiving team elects to put the ball in play by a snap.

>  - <i>Penalize blindside blocks with forcible contact to any part of a defenseless player's body.</i> Defenseless players as described by NFL rules include players receiving a blindside block from a blocker whose path is toward or parallel his own end line. Prohibited contact for punt plays would include blindside blocks with forcible contact to any part of a defenseless player's body, including below the neck.

> The figure below shows the potential reduction in concussions based on 2016-2017 data and associated assumptions.


<img src="https://s3.amazonaws.com/nonwebstorage/headstrong/redux.png" alt="chart" height="400" width="600">


You can see there's minimal setup and then we get right to the point. The rest of the report goes into each recommendation in more detail. 

For the CDC challenge, you might consider starting right away with your KPIs and why they will be useful.

It doesn't seem too different for the NFL challenge. A strong start might include your recommendations on a new metric that measures or predicts defense performance.

# 2.  Focus your analysis.

My own preference is for a focused analysis that thoroughly explores one part of a problem, vs. an analysis that touches on several things. There are many directions you can go and only 2-3 months to deliver a report!

CDC poses four questions in the intro, which seem like a good place to start. You might pick your favorite question, break it down a bit further, and use that as your overall theme.

 - How do you help cities adapt to a rapidly changing climate amidst a global pandemic, but do it in a way that is socially equitable?
 - What are the projects that can be invested in that will help pull cities out of a recession, mitigate climate issues, but not perpetuate racial/social inequities?
 - What are the practical and actionable points where city and corporate ambition join, i.e. where do cities have problems that corporations affected by those problems could solve, and vice versa?
 - How can we measure the intersection between environmental risks and social equity, as a contributor to resiliency?
 
 
 
 The NFL poses several interesting questions:
 - What are coverage schemes (man, zone, etc) that the defense employs? What coverage options tend to be better performing?
 - Which players are the best at closely tracking receivers as they try to get open?
 - Which players are the best at closing on receivers when the ball is in the air?
 - Which players are the best at defending pass plays when the ball arrives?
 - Is there any way to use player tracking data to predict whether or not certain penalties – for example, defensive pass interference – will be called?
 - Who are the NFL’s best players against the pass?
 - How does a defense react to certain types of offensive plays?
 - Is there anything about a player – for example, their height, weight, experience, speed, or position – that can be used to predict their performance on defense?
 

I think a winning report will identify what matters most to the target audience. Domain research is helpful here to identify the most pressing issues and understand what different stakeholders want.


# 3. Spend time on your visualizations. 

The art and science of visualization has become a field of study on it's own with many, many things to learn. For now, here are some things you can consider to help your visualizations reinforce your message.

- Keep it simple. Your audience should understand what they're looking at within a few seconds.
- Use color to reinforce your message. Color is one of the "preattentive attributes" that people pick up on right away. A bar chart with mostly gray bars and one bar highlighted in color is a great way to focus attention. Along those lines, try not to use default seaborn bar charts with the rainbow of colors. Too much color can be worse than no color.
- Tell the reader what to takeawy from the chart. That might not sound right at first, but your chart should reinforce your data-driven conclusions. It's ok to put that point front and center on the page. The audience can ask questions and make other points if they see things differently.

Here'an example from Tom Bresee's [Next-Gen EDA](https://www.kaggle.com/tombresee/next-gen-eda) notebok for the NFL Databowl competition. It uses a super-clean layout with great use of color. I find it much more effective than a regular old bar chart. In  my opinion it is one of the best examples on Kaggle of a minimal, effective visualization.


<img src="https://i.imgur.com/cCgC2NM.png" alt="schedule" width="600" height="500">

<br>
I'm also a big fan of interactive charts. Plotly and Bokeh are good choices. Using a wrapper like Holoviews or hvplot makes interactive charts even easier. Here's a simple line chart using Holoviews.


In [None]:
from glob import glob
import pandas as pd
import holoviews as hv

hv.extension('bokeh')

In [None]:
files = glob("../input/cdp-unlocking-climate-solutions/Corporations/"
                 "Corporations Disclosing/Climate Change/20*.csv")
files.sort()
corp_count = []
for file in files:
    respondents = pd.read_csv(file, usecols=['account_number']).shape[0]
    corp_count.append(respondents)

df = pd.DataFrame(corp_count, columns=['count']) \
        .assign(year=range(2018,2021))

chart_opts = {'tools': ['hover'],
              'width': 500,
              'xticks': list(range(2018,2021)),
              'yticks': list(range(0,1100,200)),
              'ylim': (0, 1200),
              'padding': 0.1,
              'title': "Corporate Respondents"
              }
c = hv.Curve(df, 'year', 'count').opts(**chart_opts)
p = hv.Points(df, ['year', 'count']).opts(size=10)
display(c * p)

# 4. Make your notebook easy to read. 

Notice how I numbered my major headers for the notebook? That's one way to help keep readers keep their place as they scroll through. Kaggle has a great feature now that makes a table of contents off to the side of the notebook based on your headers. Even better!


Here are some other things you can try.

 - Use html tags and hierarchy to organize your notebook. Markdown is great for convenience. It's also flexible by allowing direct use of HTML tags. Here's an example of code for custom HTML to make section headers and structure that stand out. All you need is code at the top of your notebook and then matching tags in the markdown cells.


In [None]:
# %%HTML

# This is the code for a code cell that sets the formats. Put it at the top of your notebook.

# <style type="text/css">

# div.h2 {
#     background-color: steelblue; 
#     color: white; 
#     padding: 8px; 
#     padding-right: 300px; 
#     font-size: 24px; 
#     max-width: 1500px; 
#     margin-top: 50px;
#     margin-bottom:4px;
# }



# This is the actual HTML code you would put in a markdown cell. I can't do it here because
# it cuts the rest of the notebook off for some reason.

# <div class=h2>1. Section Title. </div>
# Blah blah blah.

 - Provide readable code. One of the best resources I've found is this Kaggle notebook [Six steps to more professional data science code](https://www.kaggle.com/rtatman/six-steps-to-more-professional-data-science-code) by Rachael Tatman. The sections on "readable" and "stylish" have great info.

 - Another thing you might consider is to hide your code (perish the thought!) Normally, code is what we're all about. Part of what makes Kaggle notebooks great is that we learn a ton from each other's code, and the last thing you probably want to do is hide it. But analytics reports are different. It's more important for your readers to first get the overall story if they're so inclined. Once you pass that first lookover, you can expect someone will dig into your code.

# 5. Pay close attention to the stated criteria.

Scores for analytics challenges are subjective by their very nature. Even so, you will need to meet the requirements stated in the competition. The criteria for the CDP competition are as follows:

Accuracy/Completeness

 - Did the author develop one or more key performance indicators (KPIs)?
 - Did the author provide a way of assessing the performance and accuracy of their solution?
 - Are the KPIs useful for discussing relationships between social issues and environmental issues and demonstrating whether city and corporate ambitions take these factors into account?
 - Do the KPIs accurately reflect the underlying data?

Communication

 - Does the notebook have a compelling and coherent narrative?
 - Does the notebook contain data visualizations that help to communicate the author’s main points?
![](http://) - Did the author include a thorough discussion on the intersection between environmental issues and social issues?
 - Was there discussion of automated insight generation, demonstrating whether city and corporate ambitions take these factors into account?

Documentation

 - Is the code documented in a way that makes it easy to understand and reproduce?
 - Were all external sources of data made public and cited appropriately?
 
 
And for NFL DataBowl:


Innovation
 - Are the proposed findings actionable?
 - Is this a way of looking at tracking data that is novel?
 - Is this project creative?
 
Accuracy
 - Is the work correct?
 - Are claims backed up by data?
 - Are the statistical models appropriate given the data?

Relevance
 - Would NFL teams (or the league office) be able to use these results on a week-to-week basis?
 - Does the analysis account for variables that make football data complex?

Clarity
 - Evaluate the writing with respect to how clear the writer(s) make findings.

Data visualization/tables
 - Are the charts and tables provided accessible, interesting, visually appealing, and accurate?
 
 
These competitions are still a battle of perception at the end of the day. You can be the best at each of the criteria and still finish behind someone who tells a good story, or at least tells a story that resonates with the audience.

# 6. Resources and Closing.

Here are some good report-style notebooks from past analytics challenges, IMO:



 - https://www.kaggle.com/thedatabeast/making-perfect-chai-and-other-tales/comments
 - https://www.kaggle.com/anshuls235/journey-of-ctds-show-through-visuals
 - https://www.kaggle.com/philippsinger/nfl-playing-surface-analytics-the-zoo
 - https://www.kaggle.com/gaborfodor/summary-budapest-pythons
 - https://www.kaggle.com/erikbruin/recommendations-to-passnyc-1st-place-solution
 
 
Favorite sources of good visualization:

- https://fivethirtyeight.com/
- https://www.nytimes.com/section/upshot
- https://flowingdata.com/
- http://www.storytellingwithdata.com/
- https://www.edwardtufte.com/tufte/


I hope these ideas are helpful and add to the quality of everyone's reports. I also hope to continue learning from this great community!