DataVisualization - Speed Dating analysis

Contributors : Antoine Boulat, Simon Delarue, Mohammed El Yaagoubi, Mathias Nourry, Lingli Zhan

Check the poster of the project here !

Table of contents

Usage
Project description
1.1 Data
1.2 Environment
Designs
2.1 Ratings analysis
2.2 Evolution of ratings with regard of personal characteristics
2.3 Match analysis

0. Usage

To run the web application please follow instructions (command lines) below to :

clone this Github repo and move to the cloned directory
launch application script
go to http://127.0.0.1:5000/

$ git clone https://github.com/simondelarue/DataVisualization
$ cd DataVisulization/webapp
$ python3 app.py

1 Project description

1.1 Data

In this Datavisualization project, we analyse the Speed Dating dataset.

The dataset contains 8379 entries for 195 variables. Yet, data is missing for almost 26% of the whole dataset (or is non relevant to be filled - for example if the candidates had to choose between specific items to fill).

In the data, we have got information about candidates from all around the world, like gender, age, background (studies) but also about what they expect from the speed-dating meeting, i.e their goal. Finally, for each candidate we have the answers about questionnaires that were given to them, regarding their feelings about themselves, the attributes they put in their scorecards about the candidate they met and feeling about the event. Most of this information is already encoded as numerical values. A smaller part of the variables are still qualitative.

We have 551 unique candidates for the whole dataset, 49.94% female and 50.06% male. Data has been gathered on 21 waves of speed-dating.

1.2 Environment

We used the following technologies for this project

2 Designs

For the purpose of the project, we first designed sketches of the different visualizations we wanted to implement. On this basis, we developped the final web application that gathers all the designs.

In the following sections, we present sketches as well as implemented designs.

2.1 Ratings analysis

It is quite usual that - when asked to judge and rate ourselves on subjective items such as attractiveness or fun - our own perception is different from the one proposed by a third-party. Yet, is this hypothesis just a feeling or can we measure this divergence ? This visualization proposes to give insight to answer this question, by providing the user with an interactive analysis based on ratings fulfilled by candidates before, during and after the speed dating exercise.

For this study, we propose a martini-glass based approach, meaning that the user is given a global idea about the answer, with a small analysis provided, and is then invited to interact with the tool in order to find answers to more precise questions that could eventually arise during the first part.

The general overview of the intial sketch desing was as following

Using python for data preprocessing and D3.js to build radar chart and add interactivity, the final visualization tool looks like the following for the user.

This tool gives the user the ability to filter data on different dimensions :

Gender
Question asked to candidate

Thanks to a time slider the user can get an intuition of the evolution of candidates answers on topic such as "what they look for" or "how they think they measure up".

On the left of the tool, a small information textbox allows the user to understand how to actually read the radar chart, and provides a small analysis of the data initially displayed. For this information box, match on content is used, help the user recognize which features are analysed.

2.2 Evolution of ratings with regard of personal characteristics

This data may be interesting to a more specialised audience such as sociologists and/or psychologists, that’s the reason why it is on a separate webpage, and also because of a change of approach.

After the first part using the martini-glass based approach, this section, as the user is an expert, drill-down approach is preferred.

Having showed the difference between what people are looking for in the opposite sex and what people think the fellow men/women are looking for in the opposite sex, our goal is to provide more detailed information for all those who are interesed in having a more in-depht anaysis. A short list of questions that our users may be interested in are: Do the participants think they have very different rankings of the 6 criteria? Are these differences significant? Between men and women ? How does it change with age?

Our objective is to provide elements so that the user can make up its own mind. That is the reason why we are providing only figures from the dataset, not any interpretation/assertion. We use a design more reader-driven approach. Sketch 2 supports tasks such as psychological hypotheses formation, participants profiling …

The general overview of this second sketch desing was as following

Using python for data preprocessing and D3.js to build flower diagram and add interactivity, the final visualization tool looks like the following for the user.

This tool gives the user the ability to filter data on different dimensions :

Gender
Age range
Attributes

The implemented design is very similar to what was designed in the first place, with 2 exceptions correcting identified shortcomings:

On the upper left, the gender selector doesn’t contain anymore “All”. The choice is limited to female or male icone. We chose to remove this option because of the size of our dataset. Actually, we have only 551 unique candidates. If the age range is too narrow, the possibility of having only women or only men in the selected population is very high and thus resulting in misleading information.
Ticket on the lower right has been replaced by a text box. Before the change, only the second bar visualization provides information that is not already displayed in the flower diagram, i.e “ how the selected population think most of their fellow men/women have rated the criterion”.
This change has allowed us to have more space, thus enabling us to add more valuable information. Indeed, not only users know what is the rating for the selected population on the chosen criteria, but also several significant static information are added.

2.3 Match analysis

We all want to know the secret to success in the dating world. Every self-proclaimed love guru swears to know this mysterious formula that makes people desirable to others. In the age of data, a great hunch would be to use the speed dating data with over 8,000 observations of matches and non-matches, with answers to survey questions about how people rate themselves and how they rate others on several dimensions. And that’s exactly what we did !

On the left, you can find a simple scatter plot with a choice of feature to plot according to the match ratio (each point represents one of the subjects of the study). For this first visualisation, we use D3 (scatter.js) for the scatter plot and our own regression function for the linear trend.

On the right, the solar correlation map represents the level of correlation of the different features with the match ratio. This diagram allows you to easily observe which features are highly correlated with the match ratio.

This approach is both sober and simple so that the tool is as accessible as possible. You can found the following features extract from the python file :

int_corr, quantifying the degree of correlation between the passions of each of the participants,
attr_o, the attractiveness score of the candidate based on the scores of other participants
mn_sat, the candidate grade of the exam
income, the average price of real estate in the area of residence which roughly represents the person's salary
imprelig, the importance score of a subject for his partner to have the same religion
go_out which represents the frequency of partying
date which represents the subject's dating frequency

You can select one of these features and observe the impact of that feature on the match ratio using a drop-down menu. You can also create a dichotomy between men and women.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
data		data
img		img
milestones		milestones
notebooks		notebooks
webapp		webapp
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataVisualization - Speed Dating analysis

0. Usage

1 Project description

1.1 Data

1.2 Environment

2 Designs

2.1 Ratings analysis

2.2 Evolution of ratings with regard of personal characteristics

2.3 Match analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DataVisualization - Speed Dating analysis

0. Usage

1 Project description

1.1 Data

1.2 Environment

2 Designs

2.1 Ratings analysis

2.2 Evolution of ratings with regard of personal characteristics

2.3 Match analysis

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages