### How to get started

The goal of this technical guide is to present in a simple way text mining techniques to non-technical professionals. Although the guide was created having in mind the research support administrative staff, it can be used by other non-technical people as well. 


The guide is divided into three sections: 
1. **How to get started:** The page you have just landed is an introduction to this practical guide. We try to explain some basic tools that you can use to complete the examples in the following two sections and explain some of their meaning. 
2. **Text and Data Mining fundamentals:** This section uses a full text research output extracted from [CORE](https://core.ac.uk/) and demonstrates some basic text mining practices, such as sentence segmentation, tokenization, parts of speech tagging, named entity recognition and stemming. We also give some exercises to you to practice. 
3. **Examples using the CORE API:** This is a more advanced section. Some may feel that it requires some technical skills, while others will find it challenging and would want to try the final practical exercise. The section uses the [CORE API](https://core.ac.uk/services#api) and demonstrates how to create word clouds from the University of Cambridge publications deposited in the institutional repository, Apollo. 

#### Why Python Notebooks
A [Python Notebook](https://en.wikipedia.org/wiki/IPython) is a cool and easy way to combine in the same page text and code, and easily create a narrative that requires the combination of text and coding. Our examples require a combination of text and code and that is the main reason that we used Python Notebooks when creating this guide.

Each Python Notebook has two sections: The one with simple text, just like this one you are reading now, and another section for code...


`code will have a different font than text`

#### How to read the following pages
If you were reading any other page, a blog post, a MS word document, a webpage, you would probably use the cursor to scroll down and up the page. Python Notebooks are a bit different. They have an order and their capabilities are better highlighted when the "Play" button is used. The "Play" button is located on the left hand side of the code field and you should click on it each time you want to move to the next section. The "Play" button not only helps you move from one section to another, but it also distinguishes the executable code from the plain text and executes it. If this is your first experience with Python Notebooks the touch and feel may look odd to you, but you will soon get used to it.

The next block is a cell with code in it. Whenever you meet a similar block you need to click on the "Play" button to run the code. As you go over this guide, you will see many blocks with code; you must run them by pushing the "Play" button and you must do that in the order that the code blocks appear. If you fail to run a code block, there are high chances that the next block code will not run properly and you will receive an error message.

To run the next block put your curson on top of the double square brackets [ ] and a "Play" arrow will appear; you must click on it. 

In [0]:
print ("Hello TDM!")

If you are seeing the message "`Hello TDM!`" then the code has run properly! 

#### Adding new lines and code

In section 3, "Examples using the CORE API", there is a practical exercise, where you will need to add new lines and add some code in it. Here we will briefly explain how to do that - we also provide the same guidelines in that specific section. 
 
 - **Add a new line of code:** Adding a new line of code is easy. Click on "`[+] CODE`" located on the top level navigation bar. 
 - **Add a new line of text**: To add a new line with text click on the "`[+] TEXT`".

[Python](https://en.wikipedia.org/wiki/Python_%28programming_language%29) is a sophisticated and powerful programming language and you are not expected to know how to use it. The purpose of this demonstration is not to teach you Python either, so you will meet code that you will probably not understand. 

In the following examples we will only ask you to copy text from earlier blocks and paste them in your own blocks. You can copy and paste text as you would do in any other case, by selecting the text and using Ctrl+C (Command+C) to copy and Ctrl+V (Command+V) to paste. 


In [0]:
print ("Hello TDM!")

**Note:** This guide was created by non-technical people with the aim to put together a guide for a non-technical audience. We cannot promise that you will understand every piece of the code (we don't either) and that you will not get error messages (we got plenty of them as well!). We have learned a lot while creating it and if we can do it, then so do you. We have to admit thought that sometimes it helps being able to contact a technical person to ask questions.

Ready to start with the next section, the "Text and Data Mining Fundamentals"? If so, click [here](https://drive.google.com/file/d/1tkSUG_X1m9wnQnwt5P16ypXuO-zEKxYo/view?usp=sharing). 



---




> 
***A special thanks to the CORE developer, Matteo Cancellieri, who has helped us build this guide***.
