In [1]:
\documentclass{article}

\usepackage{graphicx} % Required for the inclusion of images

\usepackage{titlesec}
\usepackage{titling}
\usepackage{fontspec}
\usepackage{setspace}
\usepackage{tabularx}
\usepackage{url}
\usepackage{enumerate}
\usepackage{float}

\usepackage{tikz}
\usetikzlibrary{plotmarks}

\onehalfspacing

\setlength{\droptitle}{-5em}
%\setlength\parindent{0pt} % Removes all indentation from paragraphs

% Specify different font for section headings
\newfontfamily\headingfont[]{Ubuntu}
\titleformat*{\section}{\LARGE\headingfont}
\titleformat*{\subsection}{\Large\headingfont}
\titleformat*{\subsubsection}{\large\headingfont}
\renewcommand{\maketitlehooka}{\headingfont}

%----------------------------------------------------------------------------------------
%	DOCUMENT INFORMATION
%----------------------------------------------------------------------------------------

\title{TDT4300 Datavarehus og datagruvedrift \\ Spring 2015 \\ Assignment 6: \textbf{Repetition}} % Title
\date{\vspace{-5ex}}

\begin{document}

\maketitle % Insert the title, author and date

%----------------------------------------------------------------------------------------
%%%	EXERCISE 1
%----------------------------------------------------------------------------------------

\section{Datawarehousing}

You are asked to create a data warehouse of traffic accidents in Norway to investigate the arterial routes that are most essential for the society to improve or set lower speed limits, etc. We will be looking at direct costs of accidents and we will not take into account injuries. The data come from various insurance companies and they contain:

\begin{itemize}
	\begin{item}	
		When (date) and where the accident occurred (street and city, or such road section and county).
	\end{item}
	\begin{item}
		Driver related data (we are mostly interested in the age of the driver and whether he was drunk or not).
	\end{item}
	\begin{item}
		Type of insurance of the car and insurance fees.
	\end{item}
\end{itemize}

The data are imprecisely formulated and it is part of the task to select which information is necessary to include or find a way to express the facts of the accidents. The main goal of the exercise is to practice modeling principles for data warehousing. You should mention explicitly any assumptions you may make.

\begin{enumerate}[(a)]
	\begin{item}	
		\textbf{Make a star or snowflake schema for this case description.}
	\end{item}
	\begin{item}
		\textbf{Define two different concept hierarchies (freely chosen dimensions).}
	\end{item}
\end{enumerate}

%----------------------------------------------------------------------------------------
%%%	EXERCISE 2
%----------------------------------------------------------------------------------------

\section{Association Rules}

Given the shopping basket in Table \ref{table:MarketBasketTransactions}, use the Apriori algorithm to generate all possible association rules (for minimum support 0.5 and minimum confidence 0.8). \textbf{Describe thoroughly the process and the outcome of each step.}

\begin{table}
\begin{center}
\begin{tabular}{|c||l|}
  \hline
  \textbf{TID} & \textbf{Transaction} \\ \hline \hline
  1 & A, B, C \\ \hline
  2 & A, C \\ \hline
  4 & A, D\\ \hline
  5 & B, E, F \\
  \hline
\end{tabular}
\caption {Market basket transactions.}
\label{table:MarketBasketTransactions}
\end{center}
\end{table}

%----------------------------------------------------------------------------------------
%%%	EXERCISE 3
%----------------------------------------------------------------------------------------

\section{Decision Trees}

A small computer retailer, which only sells large computer equipment to youth and students (hereinafter referred to as customers), wants to predict/decide if a customer should get a PC on credit. Table \ref{table:Dataset} contains examples of the decisions the company has made in the past. Assume that each customer record has five attributes as follows:

\begin{table}[htp]
\begin{center}
\begin{tabular}{rl}
	\textbf{Age}: & \{Young, Middle, Old\} \\
	\textbf{Income}: & \{Low, Medium, High\} \\
	\textbf{Student}: & \{Yes, No\} \\
	\textbf{Creditworthiness}: & \{Pass, High\} \\
	\textbf{PC on Credit}: & \{Yes, No\}
\end{tabular}
\end{center}
\end{table}

\noindent
Your task is to answer the following questions:
\begin{enumerate}
	\begin{item}
		\textbf{Compute the Gini index for the entire training set (Table \ref{table:Dataset}).}
	\end{item}
	\begin{item}
		\textbf{Compute the Gini index for each attribute (Customer ID, Age, Income, Student, Creditworthiness).}
	\end{item}
	\begin{item}
		\textbf{Which attribute should be selected as a split attribute?}
	\end{item}
\end{enumerate}

Suppose we have following two customers and we want to predict whether they should get a PC on credit or not. \textbf{Explain how would you proceed}.

\begin{itemize}
	\begin{item}
		Customer \# 21: A young student with medium income and "high" creditworthiness.
	\end{item}
	\begin{item}
		Customer \# 22: A young non-student with low income and "pass" creditworthiness.
	\end{item}
\end{itemize}

\begin{table}[htp]
\small
\begin{center}
\begin{tabular}{|c|c|c|c|c|c|}
\hline
\textbf{Customer ID} & \textbf{Age} & \textbf{Income} & \textbf{Student} & \textbf{Creditworthiness} & \textbf{PC on Credit} \\ \hline
1 & Young & High & No & Pass & No \\ \hline
2 & Young & High & No & High & No \\ \hline
3 & Middle & High & No & Pass & Yes \\ \hline
4 & Old & Medium & No & Pass & Yes \\ \hline
5 & Old & Low & No & Pass & Yes \\ \hline
6 & Old & Low & Yes & High & No \\ \hline
7 & Middle & Low & Yes & High & Yes \\ \hline
8 & Young & Medium & No & Pass & No \\ \hline
9 & Young & Low & Yes & Pass & Yes \\ \hline
10 & Old & Medium & Yes & Pass & Yes \\ \hline
11 & Young & Medium & Yes & High & Yes \\ \hline
12 & Middle & Medium & No & High & Yes \\ \hline
13 & Middle & High & Yes & Pass & Yes \\ \hline
14 & Old & Medium & No & High & No \\ \hline
15 & Middle & Medium & Yes & Pass & No \\ \hline
16 & Middle & Medium & Yes & High & Yes \\ \hline
17 & Young & Low & Yes & High & Yes \\ \hline
18 & Old & High & No & Pass & No \\ \hline
19 & Old & Low & No & High & No \\ \hline
20 & Young & Medium & Yes & High & Yes \\ \hline
\end{tabular}
\caption{Sample dataset.}
\label{table:Dataset}
\end{center}
\end{table}

%----------------------------------------------------------------------------------------
%%%	EXERCISE 4
%----------------------------------------------------------------------------------------

\section{Data Types}

Classify the following attributes as binary, discrete, or continuous. Also classify them as qualitative (nominal or ordinal) or quantitative (interval or ratio). Some cases may have more than one interpretation, so briefly indicate your reasoning if you think there may be some ambiguity.

\noindent
\textbf{Example:} Age in years. \textbf{Answer:} Discrete, quantitative, ratio.

\begin{enumerate}[(a)]
	\begin{item}	
		Time in terms of AM and PM.
	\end{item}
	\begin{item}
		Brightness as measured by a light meter.
	\end{item}
	\begin{item}
		Brightness as measured by people's judgments.
	\end{item}
	\begin{item}
		Angles as measured in degrees between 0 and 360.
	\end{item}
	\begin{item}
		Bronze, Silver, and Gold medals as awarded at the Olympics.
	\end{item}
	\begin{item}
		Height above sea level.
	\end{item}
	\begin{item}
		Number of patients in a hospital.
	\end{item}
	\begin{item}
		ISBN numbers for books. (Look up the format on the Web.)
	\end{item}
	\begin{item}
		Ability to pass light in terms of the following values: opaque, translucent, transparent.
	\end{item}
	\begin{item}
		Military rank.
	\end{item}
	\begin{item}
		Distance from the center of campus.
	\end{item}
	\begin{item}
		Density of a substance in grams per cubic centimeter.
	\end{item}
	\begin{item}
		Coat check number. (When you attend an event, you can often give your coat to someone who, in turn, gives you a number that you can use to claim your coat when you leave.)
	\end{item}
\end{enumerate}

%----------------------------------------------------------------------------------------
%%%	EXERCISE 5
%----------------------------------------------------------------------------------------

\section{Autocorrelation}

Which of the following quantities is likely to show more temporal autocorrelation: daily rainfall or daily temperature? Why?

%----------------------------------------------------------------------------------------
%%%	EXERCISE 6
%----------------------------------------------------------------------------------------

\section{Noise and Outliers}

Distinguish between noise and outliers. Answer following questions.

\begin{enumerate}[(a)]
	\begin{item}	
		Is noise ever interesting or desirable? Outliers?
	\end{item}
	\begin{item}
		Can noise objects be outliers?
	\end{item}
	\begin{item}
		Are noise objects always outliers?
	\end{item}
	\begin{item}
		Are outliers always noise objects?
	\end{item}
	\begin{item}
		Can noise make a typical value into an unusual one, or vice versa?
	\end{item}
\end{enumerate}

%----------------------------------------------------------------------------------------
%%%	EXERCISE 7
%----------------------------------------------------------------------------------------

\section{Similarity Measures}

For the following vectors, $x$ and $y$, calculate the indicated similarity or distance measures.

\begin{enumerate}[(a)]
	\begin{item}	
		$x = (1, 1, 1, 1), y = (2, 2, 2, 2)$ cosine, correlation, Euclidean
	\end{item}
	\begin{item}
		$x = (0, 1, 0, 1), y = (1, 0, 1, 0)$ cosine, correlation, Euclidean, Jaccard
	\end{item}
	\begin{item}
		$x = (0, -1, 0, 1), y = (1, 0, -1, 0)$ cosine, correlation, Euclidean
	\end{item}
	\begin{item}
		$x = (1, 1, 0, 1, 0, 1), y = (1, 1, 1, 0, 0, 1)$ cosine, correlation, Jaccard
	\end{item}
	\begin{item}
		$x = (2, -1, 0, 2, 0, -3), y = (-1, 1, -1, 0, 0, -1)$ cosine, correlation
	\end{item}
\end{enumerate}

%----------------------------------------------------------------------------------------
%%% NOTES
%----------------------------------------------------------------------------------------

\section*{Submission Requirements}

Submit your solution as a \textbf{PDF} file to \textit{Blackboard}. You are allowed to \textbf{work in pairs}, in which case you have to select this option in \textit{Blackboard} when submitting your solution. Make sure that the document follows the usual conventions (name, assignment/task number, etc.).

%----------------------------------------------------------------------------------------

\end{document}


SyntaxError: unexpected character after line continuation character (<ipython-input-1-17b2b9e039fa>, line 1)